preloader
blog post hero
author image

The “Security Vertigo” Is Real

On 21 April 2026, Mozilla published a post that is going to force every CISO to revisit their operating assumptions. In Mozilla’s collaboration with Anthropic, Claude Mythos Preview - a restricted frontier model made available through Project Glasswing and related partner access - helped identify 271 vulnerabilities whose fixes were included in Firefox 150 (Mozilla, 2026).

For technical leadership, this is not a “high score” for AI; it is a shift in the physics of cyber-defence. Firefox CTO Bobby Holley described the sensation his team felt when the numbers landed as “vertigo,” and Mozilla’s post conveys the underlying point clearly enough: for a hardened target like Firefox, a single bug of this class would have been a red-alert incident a year ago, and seeing so many at once raises the question of whether defenders can keep up at all (Mozilla, 2026). The direction of travel matters as much as the number itself: Claude Opus 4.6 found 22 security-sensitive bugs in Firefox 148 earlier this year. Mythos found more than twelve times that in a single evaluation pass (SecurityWeek, 2026).

One thing Mozilla was careful to say, and which we should be careful to repeat: none of the 271 findings were bugs an elite human researcher could not have found given enough time. The model is not discovering a new class of vulnerability. It is collapsing the cost of finding the class we already know about (Mozilla, 2026).

That cost collapse is the whole story.

The 32-Step Kill Chain - What AISI Actually Found

The Firefox discovery was the headline, but the more consequential data point comes from the UK AI Security Institute, which evaluated Mythos Preview against a benchmark called “The Last Ones” (TLO) - a 32-step corporate network attack simulation spanning initial reconnaissance through full network takeover, estimated to take a skilled human operator around 20 hours (UK AISI, 2026).

Mythos Preview became the first AI model to complete TLO from start to finish. The precise result is worth stating exactly, because a lot of the commentary around it isn’t:

  • Mythos completed the full 32 steps in 3 out of 10 attempts.
  • Across all 10 runs, it averaged 22 of 32 steps.
  • The next-best model, Claude Opus 4.6, averaged 16 of 32 steps (UK AISI, 2026).

That is a genuine step-change in autonomous offensive capability, and it is not a clean sweep. AISI added two caveats that every CISO reading this should internalise: the test environments contained no live defenders, no endpoint detection, and no real-time incident response, and there was no penalty for the model taking actions that would trigger security alerts. AISI was explicit that it “cannot say for sure whether Mythos Preview would be able to attack well-defended systems” (UK AISI, 2026).

So the correct reading is not “AI can now autonomously breach your enterprise.” The correct reading is AISI’s own: in small, weakly defended and vulnerable enterprise systems, the economics of multi-stage intrusion have changed. AISI has flagged that its next round of evaluations will specifically target hardened, actively-monitored ranges, so the ceiling we know about today is a floor for what we’ll know in six months.

The Friction of Truth: Discovery vs. Verification

The Firefox result also surfaces a problem that is going to define the rest of this series.

Of the 271 vulnerabilities Mythos flagged, only three were credited to Claude in the formal Firefox 150 advisory: CVE-2026-6746, CVE-2026-6757, and CVE-2026-6758. Firefox 150 itself addressed more than 40 CVEs in total (SecurityWeek, 2026). SecurityWeek (2026) infers that many of the remaining 271 findings were likely lower-severity or non-standalone-CVE issues - defence-in-depth fixes, hardening, or bugs in non-exploitable code paths - that don’t meet the threshold for a public advisory entry.

Which raises the question Post 2 of this series will spend its entire word count on: if an AI generates 271 findings and 3 reach the CVE bar, did it save your team time, or did it generate 268 tickets that somebody now has to triage?

Critics have gone harder than that. Flying Penguin (2026) argues that Anthropic’s flagship Firefox demonstration in the Mythos system card - a separate, earlier evaluation against Firefox 147, not the Mozilla 271-vulnerability run - wasn’t actually tested against Firefox proper. It was run against a SpiderMonkey JavaScript engine shell in a harness, without the browser’s process sandbox and other defence-in-depth mitigations, and the “starter” bugs in that evaluation were ones Opus 4.6 had already found. That is a narrower critique than “Mythos is hype,” and it does not undermine the Mozilla collaboration that produced Firefox 150. But it points at a real problem: discovery without independent reproducible verification is a claim, not a capability.

We think both things are true at once. Mythos has lowered the cost of discovery by an order of magnitude. And the industry does not yet have a verification layer that can keep up with it.

The Sakura Position: Architecture Is the Only Defence

Whether you read Mythos as a breakthrough or a very fast fuzzer with good PR, the operational implication is the same: if an attacker can generate 271 credible leads for the price of a few million tokens, your triage model has to be as autonomous as the attack.

For the Autonomous Enterprise, Sakura Sky is pushing clients toward three changes:

  1. Continuous remediation pipelines: Manual patching is not a survivable posture when discovery is machine-speed. If AI finds it in minutes, your pipeline has to be capable of generating, testing, and landing the fix in hours - not weeks.
  2. Automated exploitability validation: We run AI-surfaced findings against sandboxed replicas of the production environment before an alert is ever cut to SecOps. If the candidate exploit can’t achieve its objective in the replica, the finding is deprioritised. This is how you solve the 271-to-3 problem without hiring 90 more analysts. (This is the subject of Part 2 in this series.)
  3. Machine-speed identity: When a 32-step attack chain can execute in a session, perimeter-scoped identity checks are worthless. Workload identity has to be cryptographically verifiable at the sub-second execution step - which is what SPIFFE/SPIRE is actually for. (Part 4 takes a look at this.)

The Bottom Line

The Firefox discovery isn’t a reason to panic. It’s the moment to stop designing defences around the speed of a human analyst reading a dashboard.

The vertigo Holley described isn’t the sensation of falling. It’s the sensation of a reference frame changing underneath you. The teams that stabilise fastest are the ones who accept that the old cadence - quarterly pentests, weekly patch cycles, human-paced triage - isn’t coming back.


Coming up next in The Mythos Ledger: Part 2 - The Verification Crisis: The Signal and the Static. Why “discovery” is cheap, “verification” is the real bottleneck, and how sandboxed exploit-validation is going to decide which security teams survive the AI arms race.


References

Flying Penguin (2026) The Boy That Cried Mythos: Verification is Collapsing Trust in Anthropic. Available at: https://www.flyingpenguin.com/the-boy-that-cried-mythos-verification-is-collapsing-trust-in-anthropic/ (Accessed: 23 April 2026).

Mozilla (2026) The zero-days are numbered. Available at: https://blog.mozilla.org/en/privacy-security/ai-security-zero-day-vulnerabilities/ (Accessed: 23 April 2026).

SecurityWeek (2026) Claude Mythos Finds 271 Firefox Vulnerabilities. Available at: https://www.securityweek.com/claude-mythos-finds-271-firefox-vulnerabilities/ (Accessed: 23 April 2026).

UK AISI (2026) Our evaluation of Claude Mythos Preview’s cyber capabilities. Available at: https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos-previews-cyber-capabilities (Accessed: 23 April 2026).

Intelligence. Engineered.

Accelerate your operations with proven expertise built to scale and adapt.
Enable, automate, and govern the intelligent systems that keep your business moving.

Unlock Your Potential