The recent arrival of Claude Mythos Preview has been met with the usual industry fanfare, promising a new era of agentic security where AI finally moves from being a passive observer to an active defender.
However, as we have seen with every previous iteration of machine learning in this space, unbridled enthusiasm usually precedes a sobering reality check. While the headlines focus on the breakthrough capabilities of Anthropic’s latest model, security architects must look past the marketing to the significant operational liabilities it introduces.
The evaluation by the UK AI Safety Institute (AISI) provides the first glimpse into what we are actually up against. While the defensive applications are touted as revolutionary, the data suggests we are simply entering a more complex and expensive phase of the same old arms race.
What is Claude Mythos?
Claude Mythos Preview is a frontier AI model announced by Anthropic in April 2026. It is the first of a new “Mythos-class” of models specifically engineered for advanced cybersecurity tasks, including vulnerability research and autonomous zero-day exploit generation. Due to its claimed unprecedented ability to execute complex, multi-stage attack sequences, Anthropic has restricted its release to a limited group of defensive partners under an initiative known as Project Glasswing.
The official technical announcement can be found here: Anthropic Mythos Preview.
The AISI report: A warning disguised as a milestone
The AISI evaluation is the first to rigorously test Mythos against real-world cyber capabilities. The findings confirm that Mythos is the first model to successfully navigate a 32 step end-to- end corporate network attack chain. For those in the Security Operations Centre (SOC), this is a chilling development. While the industry celebrates the intelligence of the model, we must recognise that we have just lowered the barrier for entry for high-level, autonomous threats.
Anthropic frames this capability as a way to proactively hunt for threats, yet the success rates in the AISI report tell a more inconsistent story. The model succeeded in only 30% of its attempts to complete the full attack chain. In a defensive context, a 70% failure rate is catastrophic. In an offensive context, those three successful breaches represent a terrifyingly cheap way for an adversary to find a way into your network.
The AISI evaluation can be found here: AISI Mythos Evaluation.
Data contamination and the reasoning gap
A critical question remains: is this genuine reasoning or merely high speed recovery? There is a significant concern regarding data contamination, where the model might have already seen the code and historical bug reports it is being asked to test during its training phase. If Mythos identifies a vulnerability that has existed for decades, such as the 27 year old OpenBSD flaw, it is difficult to prove the model is reasoning through the logic when the full post-mortem and patch history of that specific bug has likely been part of the model’s training corpus since GPT-4.
In biological terms, we are seeing a form of Lamarckian inheritance where the AI “inherits” the acquired knowledge of thirty years of security research. It isn’t discovering; it is remembering. This reflects the phylogeny of malware: the slow evolution of a code line through generations of researchers, which the AI can now access as a single, compressed memory.
Furthermore, the AISI methodology relied on expert-in-the-loop prompting. While the AISI has been transparent in publishing their high-level methodology and prompt templates, a significant interpretability gap remains. With the model utilising up to 100 million tokens per run to navigate complex ranges, it is nearly impossible to audit the granular “strategic nudges” provided by human researchers. This suggests Mythos is less an autonomous agent and more a force multiplier that still requires a human to solve the most complex logical hurdles. The lingering uncertainty isn’t about whether the prompts are public, but whether the model can bridge these logical gaps without the constant scaffolding provided by an expert in a controlled environment.
Proponents of the model often point to its discovery of latent, obscure bugs, such as a 16-year-old flaw in the FFmpeg codec, as proof of true, synthesised reasoning. However, without transparent, independent audits of the granular prompt scaffolding and context window inputs used during those specific tests, it is impossible to separate genuine algorithmic synthesis from expert-guided statistical matching.
The rise of the Malware Forge
The most concerning aspect of Mythos is not found in its intelligence, but in its capacity for industrialised iteration: the Malware Forge. By utilising massively parallel compute, an adversary does not need the model to be perfect; they only need it to be prolific. A forge can generate thousands of variations of a single exploit, testing each one against existing antivirus updates in real-time. If a candidate is detected, the system simply iterates, not through creative thought but through sheer statistical volume, until it finds a version that escapes detection.
This application of machine-written exploits is aimed directly at the most vulnerable part of our defence: the OODA (Observe, Orient, Decide, and Act) loop of the Security Operation Centre. If an adversary can automate the trial-and-error phase of exploit development, they can deploy novel, zero-day malware faster than a human team can orient itself to the threat. The Mirage here is the idea of a brilliant AI hacker; the reality is an automated factory designed to overwhelm human bandwidth.
The Mythos capability leap: From signals to actions
As established in previous writing, AI often struggles to understand the operational intent behind an action. Mythos seeks to bridge this gap through a higher degree of agentic autonomy. While the authenticity of its reasoning remains under scrutiny, the AISI evaluation shows that Mythos is the first model with the capacity to string together complex cyber operations that previously required constant, granular human prompting.
Whether this represents true understanding or simply a more sophisticated form of pattern matching, the result is the same: a significant uplift in two specific areas: vulnerability discovery and autonomous exploit chains.
