Anthropic and OpenAI Expose SAST’s AI Security Blind Spot


TL;DR

  • Competing Launches: Anthropic and OpenAI released free AI-powered security scanners within 14 days of each other, targeting vulnerabilities that traditional Static Application Security Testing (SAST) tools cannot detect.
  • Detection Disputes: Independent testing by Checkmarx Zero found Claude identified only two true positives out of eight flagged vulnerabilities, contradicting self-reported claims.
  • Market Disruption: Analysts warn that free AI scanners from Anthropic and OpenAI could commoditize the traditional SAST market, shifting spending away from Checkmarx, Veracode, and Snyk.
  • Dual-Use Risk: The same AI models that scan for vulnerabilities can exploit them, and AI-generated code is 2.74 times more likely to introduce security flaws than human-written code.

This week, within 14 days of each other, Anthropic and OpenAI launched free AI-powered security scanners that find vulnerabilities traditional Static Application Security Testing (SAST) tools structurally cannot – but independent testing suggests the headline detection numbers may not hold up. Anthropic opened a limited research preview of Claude Code Security to Enterprise and Team customers first, followed two weeks later by OpenAI’s Codex Security entering its own research preview. Both are currently free to enterprise customers, with no announced pricing timeline.

The SAST Blind Spot

Rule-based static analysis tools work by matching code against libraries of known vulnerability signatures. That approach has a hard limit: it cannot reason about code logic outside its trained patterns. According to Anthropic, Claude Code Security addresses this by reading and reasoning about code the way a human security researcher would, tracing data flow and component interactions to catch complex vulnerabilities that pattern-matching tools miss.

Moreover, OpenAI describes Codex Security similarly, saying it builds project-level context to surface high-confidence findings with fixes while filtering out noise from insignificant bugs. Both companies ship model updates on monthly cycles, giving adversaries a recurring head start over defenders who wait.

The practical difference shows up in findings like the heap buffer overflow in the CGIF library, where Claude identified a flaw by reasoning about the LZW compression algorithm – a vulnerability that coverage-guided fuzzing could not catch even at 100% code coverage. Separately, AI security startup AISLE independently identified all 12 zero-day vulnerabilities in OpenSSL’s January 2026 security patch, including a stack buffer overflow potentially exploitable without valid key material.

Prior Research Context

That structural limitation is not new territory for AI security research. As WinBuzzer reported in October 2024, AI-powered tools using Claude had already uncovered zero-day vulnerabilities in production Python codebases long before either of these tools launched commercially. That precedent makes the current launches less surprising and raises the bar for evaluating the detection numbers each lab now claims at scale.

Furthermore, the capability gap between AI-based and rule-based scanning has been widening for at least 18 months. The launches from Anthropic and OpenAI represent productization of a demonstrated edge, not an unproven bet. That makes the absence of independent audits more conspicuous, not less.