TL;DR
- Competing Launches: Anthropic and OpenAI released free AI-powered security scanners within 14 days of each other, targeting vulnerabilities that traditional Static Application Security Testing (SAST) tools cannot detect.
- Detection Disputes: Independent testing by Checkmarx Zero found Claude identified only two true positives out of eight flagged vulnerabilities, contradicting self-reported claims.
- Market Disruption: Analysts warn that free AI scanners from Anthropic and OpenAI could commoditize the traditional SAST market, shifting spending away from Checkmarx, Veracode, and Snyk.
- Dual-Use Risk: The same AI models that scan for vulnerabilities can exploit them, and AI-generated code is 2.74 times more likely to introduce security flaws than human-written code.
This week, within 14 days of each other, Anthropic and OpenAI launched free AI-powered security scanners that find vulnerabilities traditional Static Application Security Testing (SAST) tools structurally cannot – but independent testing suggests the headline detection numbers may not hold up. Anthropic opened a limited research preview of Claude Code Security to Enterprise and Team customers first, followed two weeks later by OpenAI’s Codex Security entering its own research preview. Both are currently free to enterprise customers, with no announced pricing timeline.
The SAST Blind Spot
Rule-based static analysis tools work by matching code against libraries of known vulnerability signatures. That approach has a hard limit: it cannot reason about code logic outside its trained patterns. According to Anthropic, Claude Code Security addresses this by reading and reasoning about code the way a human security researcher would, tracing data flow and component interactions to catch complex vulnerabilities that pattern-matching tools miss.
Moreover, OpenAI describes Codex Security similarly, saying it builds project-level context to surface high-confidence findings with fixes while filtering out noise from insignificant bugs. Both companies ship model updates on monthly cycles, giving adversaries a recurring head start over defenders who wait.
The practical difference shows up in findings like the heap buffer overflow in the CGIF library, where Claude identified a flaw by reasoning about the LZW compression algorithm – a vulnerability that coverage-guided fuzzing could not catch even at 100% code coverage. Separately, AI security startup AISLE independently identified all 12 zero-day vulnerabilities in OpenSSL’s January 2026 security patch, including a stack buffer overflow potentially exploitable without valid key material.
Prior Research Context
That structural limitation is not new territory for AI security research. As WinBuzzer reported in October 2024, AI-powered tools using Claude had already uncovered zero-day vulnerabilities in production Python codebases long before either of these tools launched commercially. That precedent makes the current launches less surprising and raises the bar for evaluating the detection numbers each lab now claims at scale.
Furthermore, the capability gap between AI-based and rule-based scanning has been widening for at least 18 months. The launches from Anthropic and OpenAI represent productization of a demonstrated edge, not an unproven bet. That makes the absence of independent audits more conspicuous, not less.
Anthropic’s Move
Against that backdrop, Anthropic moved first. The company published zero-day research on February 5, 2026, alongside the release of Claude Opus 4.6. Anthropic said Claude Opus 4.6 found more than 500 high-severity vulnerabilities in production open-source codebases that had survived decades of expert review, though that figure is self-reported and has not been independently audited.
Building on this, the Claude Code Security research preview shipped two weeks later, available to Enterprise and Team customers, with free expedited access for open-source maintainers. Gabby Curtis, Communications Lead at Anthropic, said the company built Claude Code Security “to make defensive capabilities more widely available.” Anthropic predicted that a meaningful share of the world’s code will be scanned by AI as models grow more effective at finding long-hidden bugs.
That framing positions the tool as infrastructure for the entire software ecosystem, not just a product feature. Anthropic and OpenAI together carry a combined private-market valuation exceeding $1.1 trillion, and enterprise security adoption underpins each company’s growth narrative. Free pricing now is a market-capture strategy, not a permanent model.
OpenAI Answers
OpenAI followed with a tool built on similar assumptions but a different development path. Codex Security grew out of the Aardvark security agent, an internal tool that entered private beta in 2025. According to OpenAI, during beta the agent scanned more than 1.2 million commits across external repositories, surfacing 792 high-severity findings and 10,561 additional security findings.
Disclosed targets included OpenSSH, GnuTLS, GOGS, Thorium, libssh, PHP, and Chromium, yielding 14 assigned CVEs. OpenAI also reported that false positive rates fell more than 50% across all repositories during beta, with over-reported severity dropping more than 90%. NETGEAR, an early access participant, reported that Codex Security integrated smoothly into its security development environment, with findings as thorough as those from an experienced product security researcher.
However, that scale of deployment makes the 14 CVE yield notable for what it omits: none of the underlying detection rate data has been released in auditable form. According to OpenAI, a 50% false positive reduction is a meaningful engineering milestone, but reduction from an unknown baseline tells security directors little about the absolute precision rate they should expect in their own environments.
Cracks in the Claims
Headlines from both labs come without independent verification. Checkmarx Zero researchers found that in a full production-grade codebase scan, Claude identified eight vulnerabilities but only two were true positives – a result that stands in sharp contrast to the self-reported beta accuracy. Neither Anthropic nor OpenAI has submitted detection claims to an independent third-party audit.
As a result, enterprise security teams that integrate probabilistic AI scanners into compliance-driven pipelines need reproducible outputs, not results that shift with each monthly model update. The absence of audited benchmarks leaves security directors without a reliable basis for procurement decisions.
In addition, that verification gap has a governance counterpart: in interviews with more than 40 CISOs, VentureBeat found that formal governance frameworks for reasoning-based scanning tools barely exist. Cycode CTO Ronen Slavin stated that AI models are probabilistic by nature and that security teams require “consistent, reproducible, audit-grade results” – a standard these tools have not yet demonstrated at production scale.
The Dual-Use Stakes
Defenders and adversaries share the same capability. Both tools are free to enterprise customers, and the underlying models are accessible more broadly. A University of Illinois study found that the now outdated GPT-4 model was already autonomously exploiting known vulnerabilities and at the same time achieved a 90% success rate against real-world systems, confirming that the same class of models defending codebases can also be turned against them.
Meanwhile, timing adds pressure. According to the Veracode 2025 GenAI Code Security Report, AI-generated code is 2.74 times more likely to introduce security vulnerabilities compared to human-written code. The tooling that writes vulnerable code and the tooling that scans for it are accelerating on the same timeline.
The Center for Strategic and International Studies analyzed the stakes directly:
“Improvements in automated vulnerability detection do not automatically favor defense. Instead, they likely accelerate the tempo of cyber competition, pushing defenders to leverage AI tools to find and mitigate flaws before adversaries use these tools to weaponize them.”
Both Anthropic and OpenAI are heading toward IPOs, and enterprise security wins serve their growth narratives. Free access today reflects a land-grab dynamic common in developer tooling markets. Security directors should treat the research preview designation as meaningful – both products are subject to change as models update.
Industry Reaction
The competitive dynamic is already reshaping how analysts view the SAST market. Merritt Baer, CSO at Enkrypt AI and former Deputy CISO at AWS, told VentureBeat security reporters that if code reasoning scanners from major AI labs are effectively free to enterprise customers, then static code scanning commoditizes overnight.
In contrast, the implication for commercial SAST vendors – Checkmarx, Veracode, Snyk – is direct and immediate. Baer predicts spending gravity will shift away from traditional SAST licenses toward tooling that shortens remediation cycles.
Snyk and Cycode have pushed back on the disruption narrative, arguing that finding bugs is not the bottleneck. Governance, remediation workflow, and audit-grade reproducibility are where the real operational work happens. A scanner that surfaces hundreds of findings without integration into a remediation pipeline adds noise, not security.
For security teams, the recommendation is not to choose between the two tools but to run both. Baer:
“Different models reason differently, and the delta between them can reveal bugs neither tool alone would consistently catch. In the short term, using both isn’t redundancy. It’s defense through diversity of reasoning systems.”
Merritt Baer, CSO at Enkrypt AI and former Deputy CISO at AWS (via VentureBeat)
Security teams that run a 30-day empirical pilot with both tools will have real data before committing to either. For the engineers maintaining open-source libraries that underpin global infrastructure – OpenSSH, libssh, PHP – the window between a vulnerability’s discovery by an AI scanner and its weaponization by a threat actor is narrowing with each model release. The calendar adversaries are watching does not pause for procurement cycles.

