Beyond the LLM: Building a High-Coverage Vulnerability Discovery Harness
The integration of Large Language Models (LLMs) into security workflows has long promised a revolution in vulnerability discovery. However, the transition from a chat interface to a production-grade security tool is fraught with challenges, ranging from high false-positive rates to the inherent limitations of model context windows.
Recently, Cloudflare participated in Project Glasswing, gaining early access to Anthropic's Mythos Preview. By pointing this security-focused model at over fifty of their own repositories, Cloudflare identified a critical distinction: while the model's raw capabilities are a significant leap forward, the real value is unlocked only when the model is embedded within a sophisticated orchestration harness.
The Leap in Capability: Exploit Chaining and Proofs
Mythos Preview represents a shift from general-purpose coding assistants to a specialized security tool. Two primary capabilities distinguish it from previous frontier models:
- Exploit Chain Construction: Most automated scanners identify isolated bugs. Real-world attackers, however, chain multiple low-severity primitives—such as turning a use-after-free bug into an arbitrary read/write—to achieve full system control. Mythos Preview can reason about these primitives and synthesize them into a working exploit chain, mimicking the logic of a senior security researcher.
- Autonomous Proof Generation: The model does not merely speculate. It can write code to trigger a suspected bug, compile it in a scratch environment, execute it, and iterate based on the failure logs. This loop transforms a "potential" vulnerability into a proven exploit.
Despite these advances, the model exhibits "organic refusals." Even without standard safety guardrails, the model occasionally pushes back on legitimate research requests. These refusals are inconsistent; a request refused in one context may be accepted if framed differently, suggesting that emergent guardrails are not a reliable substitute for intentional safety boundaries.
The Signal-to-Noise Challenge
AI-driven scanning often suffers from a signal-to-noise problem. Cloudflare noted two primary drivers of noise:
- Language Bias: Memory-unsafe languages like C and C++ generate significantly more false positives (e.g., buffer overflows) compared to memory-safe languages like Rust.
- Model Bias: LLMs are prone to "hedging"—using terms like "potentially" or "could in theory." In a triage queue, these speculative findings waste human attention and tokens.
Mythos Preview mitigates this by providing Proof-of-Concepts (PoCs). A finding accompanied by a working PoC is actionable, drastically reducing the time spent questioning whether a bug is real.
Why Generic Agents Fail at Scale
A common instinct is to point a generic coding agent at a repository and ask it to "find bugs." Cloudflare found this approach ineffective for two reasons:
- Context Constraints: Coding agents are designed for linear tasks (feature building or refactoring). Vulnerability research is parallel and narrow. A single agent session often exhausts its context window before covering even a fraction of a percent of a large codebase.
- Throughput Bottlenecks: A single-stream agent cannot handle the volume of hypotheses required for high coverage.
The Solution: The Vulnerability Discovery Harness
To achieve meaningful coverage, Cloudflare moved away from a single agent and built a multi-stage harness. The core philosophy is that narrow scope produces better findings.
The Pipeline Architecture
| Stage | Function | Strategic Purpose |
|---|---|---|
| Recon | Top-down repo analysis to produce architecture docs and trust boundaries. | Provides shared context; prevents the model from "wandering." |
| Hunt | Parallel agents (up to 50) targeting specific attack classes in scoped areas. | Maximizes coverage through narrow, concurrent tasks. |
| Validate | An independent agent attempts to disprove the hunter's finding. | Uses adversarial review to filter noise. |
| Gapfill | Re-queues areas that were touched but not thoroughly covered. | Prevents the model from drifting toward "easy" success patterns. |
| Dedupe | Collapses findings with the same root cause. | Prevents queue inflation. |
| Trace | Uses cross-repo symbol indices to see if attacker input reaches the bug. | Converts a "flaw" into a "reachable vulnerability." |
| Feedback | Feeds reachable traces back into the Hunt stage for consumer repos. | Creates a continuous improvement loop. |
| Report | Generates structured data against a predefined schema. | Ensures output is queryable and actionable. |
Strategic Implications for Security Teams
The emergence of tools like Mythos Preview has led some teams to pursue aggressive SLAs—some aiming for a two-hour window from CVE release to production patch. However, Cloudflare warns that speed alone is a dangerous metric. If regression testing takes a day, a two-hour SLA requires skipping tests, which often introduces bugs more severe than the original vulnerability.
Instead of focusing solely on the speed of the patch, the focus should shift to architectural resilience:
- Implementing defenses that block bugs from being reached.
- Designing applications so a flaw in one component cannot grant access to others (compartmentalization).
- Improving deployment infrastructure to roll out fixes globally and instantaneously.
Community Perspectives and Critiques
While the technical framework is robust, the community response highlights a desire for more transparency. Several critics on Hacker News pointed out the lack of hard data, asking for specific numbers on how many vulnerabilities were found and the ratio of true positives to false positives.
Others noted that the "lessons learned"—such as the efficacy of narrow prompts and adversarial review—are well-known in the agentic AI community. However, as one observer noted, the implementation of a cluster of actors working on shared, structured context snippets is a highly applicable model for many fields beyond cybersecurity.
Ultimately, the shift is clear: the future of AI security is not about the model itself, but about the harness that directs it.