On 2026-05-18
by Rhidian Jowers
Cybersecurity

Mythos or mirage: The hidden burden of agentic security

Table mythos or mirage
Summary

The recent arrival of Claude Mythos Preview has been met with the usual industry fanfare, promising a new era of agentic security where AI finally moves from being a passive observer to an active defender.

However, as we have seen with every previous iteration of machine learning in this space, unbridled enthusiasm usually precedes a sobering reality check. While the headlines focus on the breakthrough capabilities of Anthropic’s latest model, security architects must look past the marketing to the significant operational liabilities it introduces.

The evaluation by the UK AI Safety Institute (AISI) provides the first glimpse into what we are actually up against. While the defensive applications are touted as revolutionary, the data suggests we are simply entering a more complex and expensive phase of the same old arms race.

What is Claude Mythos?

Claude Mythos Preview is a frontier AI model announced by Anthropic in April 2026. It is the first of a new “Mythos-class” of models specifically engineered for advanced cybersecurity tasks, including vulnerability research and autonomous zero-day exploit generation. Due to its claimed unprecedented ability to execute complex, multi-stage attack sequences, Anthropic has restricted its release to a limited group of defensive partners under an initiative known as Project Glasswing. 

The official technical announcement can be found here: Anthropic Mythos Preview.

The AISI report: A warning disguised as a milestone

The AISI evaluation is the first to rigorously test Mythos against real-world cyber capabilities. The findings confirm that Mythos is the first model to successfully navigate a 32 step end-to- end corporate network attack chain. For those in the Security Operations Centre (SOC), this is a chilling development. While the industry celebrates the intelligence of the model, we must recognise that we have just lowered the barrier for entry for high-level, autonomous threats.

Anthropic frames this capability as a way to proactively hunt for threats, yet the success rates in the AISI report tell a more inconsistent story. The model succeeded in only 30% of its attempts to complete the full attack chain. In a defensive context, a 70% failure rate is catastrophic. In an offensive context, those three successful breaches represent a terrifyingly cheap way for an adversary to find a way into your network.

The AISI evaluation can be found here: AISI Mythos Evaluation.

Data contamination and the reasoning gap

A critical question remains: is this genuine reasoning or merely high speed recovery? There is a significant concern regarding data contamination, where the model might have already seen the code and historical bug reports it is being asked to test during its training phase. If Mythos identifies a vulnerability that has existed for decades, such as the 27 year old OpenBSD flaw, it is difficult to prove the model is reasoning through the logic when the full post-mortem and patch history of that specific bug has likely been part of the model’s training corpus since GPT-4.

In biological terms, we are seeing a form of Lamarckian inheritance where the AI “inherits” the acquired knowledge of thirty years of security research. It isn’t discovering; it is remembering. This reflects the phylogeny of malware: the slow evolution of a code line through generations of researchers, which the AI can now access as a single, compressed memory.

Furthermore, the AISI methodology relied on expert-in-the-loop prompting. While the AISI has been transparent in publishing their high-level methodology and prompt templates, a significant interpretability gap remains. With the model utilising up to 100 million tokens per run to navigate complex ranges, it is nearly impossible to audit the granular “strategic nudges” provided by human researchers. This suggests Mythos is less an autonomous agent and more a force multiplier that still requires a human to solve the most complex logical hurdles. The lingering uncertainty isn’t about whether the prompts are public, but whether the model can bridge these logical gaps without the constant scaffolding provided by an expert in a controlled environment.

Proponents of the model often point to its discovery of latent, obscure bugs, such as a 16-year-old flaw in the FFmpeg codec, as proof of true, synthesised reasoning. However, without transparent, independent audits of the granular prompt scaffolding and context window inputs used during those specific tests, it is impossible to separate genuine algorithmic synthesis from expert-guided statistical matching.

The rise of the Malware Forge

The most concerning aspect of Mythos is not found in its intelligence, but in its capacity for industrialised iteration: the Malware Forge. By utilising massively parallel compute, an adversary does not need the model to be perfect; they only need it to be prolific. A forge can generate thousands of variations of a single exploit, testing each one against existing antivirus updates in real-time. If a candidate is detected, the system simply iterates, not through creative thought but through sheer statistical volume, until it finds a version that escapes detection.

This application of machine-written exploits is aimed directly at the most vulnerable part of our defence: the OODA (Observe, Orient, Decide, and Act) loop of the Security Operation Centre. If an adversary can automate the trial-and-error phase of exploit development, they can deploy novel, zero-day malware faster than a human team can orient itself to the threat. The Mirage here is the idea of a brilliant AI hacker; the reality is an automated factory designed to overwhelm human bandwidth.

The Mythos capability leap: From signals to actions

As established in previous writing, AI often struggles to understand the operational intent behind an action. Mythos seeks to bridge this gap through a higher degree of agentic autonomy. While the authenticity of its reasoning remains under scrutiny, the AISI evaluation shows that Mythos is the first model with the capacity to string together complex cyber operations that previously required constant, granular human prompting.

Whether this represents true understanding or simply a more sophisticated form of pattern matching, the result is the same: a significant uplift in two specific areas: vulnerability discovery and autonomous exploit chains.

Key AISI evaluation findings

Table mythos or mirage

Vulnerability discovery or patching paralysis?

One of the most praised aspects of the Mythos preview is its ability to find deep seated vulnerabilities. The AISI findings highlight that the model uncovered a 27-year old flaw in OpenBSD and multiple vulnerabilities in Firefox. While this demonstrates a sophisticated understanding of code logic, it ignores the operational reality of the security team.

Finding a vulnerability is not the same as securing a network. As established through Proof of Concept (PoC) deployments, the promise of AI often meets a harsh reality in the backlog of a security team. Identifying 181 new zero-day exploits for a browser engine merely creates an unmanageable prioritisation crisis. The emergence of “Information Asymmetry via AI” signals a dangerous divergence in operational speed: the attacker’s Time to Discover (TTD) scales exponentially, yet the defender’s Time to Remediate (TTR) remains bottlenecked by change advisory boards, legacy dependencies, and human limitations. By providing a more efficient engine for exploit generation, we are merely burying defenders under a mounting workload. This shift facilitates an expansion of the attack surface rather than the promised contraction of risk.

The MCP trap: Autonomy versus Intent

The engine behind the agentic nature of Mythos is the Model Context Protocol (MCP). This protocol allows the AI to move from analysis to action by connecting it directly to your tools and data. This is the foundation of the autonomous security agent: a system that can theoretically find and neutralise a threat without human intervention.

However, the AISI report reinforces a long held concern regarding the gap between statistical analysis and business intent. The danger here is twofold. First, there is the risk of indirect prompt injection. If Mythos is given the autonomy to read system logs to identify threats, an attacker can hide malicious commands in those logs. A single line of text could instruct the agent to ignore a specific breach or even to misclassify malicious lateral movement as a routine update.

Secondly, an autonomous agent lacks the context of your business operations. As I have noted before, an AI might flag an administrator logging in at 3am as an anomaly, but it cannot know that this is a scheduled window for server patching. Giving an agent the power to act on these anomalies without human judgment leads to cascading failures and unnecessary outages.

The economic cost of the arms race

We must also consider the fiscal reality of these new systems. Mythos is a resource intensive model that requires high intensity reasoning for every step it takes. This introduces a new class of threat: the cost maximising attack.

Adversaries are now using prompt flooding to generate ambiguous signals that force automated defence systems to perform thousands of expensive API calls. The AISI data on the computational load required for Mythos to solve expert level tasks suggests that even a successful defence could lead to budgetary exhaustion. This is essentially a “Denial of Wallet” attack. We are approaching a point where a breach is not the only way to cripple a security department; you can simply bankrupt their automated defences by forcing the AI to “think” too much.

Conclusion: The persistence of the human expert

The AISI evaluation of Claude Mythos Preview serves as a necessary corrective to the industry hype. It proves that while agentic AI is capable of remarkable feats, it remains inconsistent, prone to manipulation, operationally expensive, and impossible to perfectly contain. The black box problem has not been solved: it has simply been given the power to execute commands.

At Airbus Protect, we continue to advocate for a strategy that keeps a human in the loop for every high impact action. Automation is a tool for speed, but judgment is a human responsibility. The final decision to isolate a subnet or block an account must remain with an expert who can interpret the AI rationale through the lens of business reality.

  • Share