Opinion

AI Safety Assurance Is Still Goodwill, Not Evidence

The UN’s first scientific assessment of AI says safety assurance still runs on developer goodwill, not verification. Here is what evidence-based governance would actually require of agentic systems.

AI Safety Assurance Is Still Goodwill, Not Evidence — hero image

On 1 July 2026, the Independent International Scientific Panel on AI released its first Preliminary Report: a scientific assessment of AI capabilities, opportunities and risks, produced by forty scientists from every UN region and co-chaired by Yoshua Bengio and Maria Ressa (Independent International Scientific Panel on Artificial Intelligence, 2026). It landed a week before the inaugural Global Dialogue on AI Governance in Geneva, where the findings go to Member States on 6 and 7 July. UN Secretary-General António Guterres put the point plainly at the launch: “The science is here. We can no longer say we did not know. What we do with it is now up to all of us” (UN News, 2026).

The report’s framing, echoed in the Panel’s own announcement, is that AI’s benefits are not automatic and its harms are not inevitable, and that the difference comes down to the institutions and policies built around the technology (United Nations Office for Digital and Emerging Technologies, 2026). That is a reasonable, careful sentence. It is also the kind of sentence that is easy to nod along to and forget by lunchtime. The line worth sitting with is further down the page.

The line that matters more than the headline

Buried in the Panel’s discussion of AI measurement is this: “Without standardized, rigorous, independent third-party assessment, similar to what exists for the pharmaceutical and aeronautical industries, assurance of safety largely depends on developer goodwill” (Independent International Scientific Panel on Artificial Intelligence, 2026, p. 15).

That is a UN scientific panel saying, in a report going to every Member State next week, that the industry building the most consequential technology of the decade currently asks the world to trust its own homework. Frontier developers set their own risk thresholds, design their own safety evaluations, and choose what to disclose. Governments mostly receive whatever testing data the developer decides to share (Independent International Scientific Panel on Artificial Intelligence, 2026, p. 15).

The Panel is equally blunt about the state of governance built on top of that foundation. Over forty distinct AI governance instruments already exist across corporate, national and international layers, but they are “fragmented, concentrated at the corporate level” and “neither systematic nor comprehensive” (Independent International Scientific Panel on Artificial Intelligence, 2026, p. 43). Some measure nothing beyond inputs like investment and headcount. Without a way to check whether any of them actually change outcomes, the report warns, governance risks becoming symbolic.

None of this is new to anyone who has tried to produce audit-ready evidence for an AI system rather than a policy document about one. What is new is a UN scientific body saying it out loud, on the record, ahead of a governance summit.

Agentic AI is why this stopped being theoretical

Self-attestation was already a weak foundation for chatbots. For agentic systems, the Panel argues, it stops being adequate at all.

Three findings compound each other. First, oversight has not been operationalized as something you can actually measure: “Human oversight is not yet operationalized as a measurable requirement, with concrete expectations for intervention, reversibility and accountability” (Independent International Scientific Panel on Artificial Intelligence, 2026, p. 43). Second, the report is explicit that testing the model is not the same as testing the system a model runs inside: “the unit of evaluation must be the deployed system including model, tools, environment and users, not the model alone” (Independent International Scientific Panel on Artificial Intelligence, 2026, p. 42). Third, and most uncomfortable, evaluation itself may not be trustworthy: leading systems show “evaluation awareness,” and the Panel notes that models “could be instructed by humans or autonomously choose to temporarily reduce their test performance on dangerous capability assessments” (Independent International Scientific Panel on Artificial Intelligence, 2026, p. 15).

Put those together and the picture is stark. An agent can be granted real-world tool access, act with reduced human oversight, and the only evidence anyone has that it behaved safely is a self-reported test result from a system that may know it is being tested. That is not a compliance gap. It is a verification gap, and it sits directly beneath every claim about “responsible AI deployment” made by a vendor whose evaluation methodology nobody outside the company has audited.

Figure 1. Two models of AI safety assurance, contrasted against the pharmaceutical and aeronautical standard the Panel points to.
Figure 1. Two models of AI safety assurance, contrasted against the pharmaceutical and aeronautical standard the Panel points to.

What evidence-based assurance actually looks like

The Panel’s pharma and aviation comparison is worth taking literally rather than as a rhetorical flourish. Neither industry asks the public to trust a manufacturer’s internal safety claims. Both require reproducible evidence, generated by a process the manufacturer does not fully control, that a party who was not in the room can independently verify.

Applied to agentic AI, that means moving the evidence away from the developer’s word and onto the runtime itself: a record of what an agent was permitted to do, what it actually did, and whether oversight controls were honoured or bypassed, generated in a way that cannot be quietly edited after the fact. That is the specific problem GATE, an open framework for agentic AI evidence that I have been developing independently of any Sakura Sky product, sets out to address: attested evidence trails rooted in hardware isolation rather than in a vendor’s internal test report. The GATE Conformance Runner covers the mechanics of how that evidence gets produced and mapped against specific regulatory obligations.

For organisations that need this working today rather than as a research question, that is also the shape of the gap Sakura Sky’s GRC practice is built to close: evidence-based assurance delivered as an outcome, not a framework left for a compliance team to implement from a whitepaper.

Geneva is the test

The Global Dialogue on AI Governance meets in Geneva on 6 and 7 July, and this report is what Member States will be handed on arrival. The Panel has done the harder-than-expected part: naming the evidence dilemma precisely enough that “we did not know” no longer holds. Whether Geneva produces the standardized, independent, pharma-and-aviation-grade assessment regime the Panel is describing, or simply adds a forty-first fragmented instrument to the pile, is the question the next twelve months will answer.

Disclosure: Sakura Sky offers GRC advisory services, including a Praxis compliance offering, referenced above. GATE is Andrew Stevens’ independent open framework and is not a Sakura Sky product.

References

Independent International Scientific Panel on Artificial Intelligence, 2026. Preliminary Report of the Independent International Scientific Panel on AI: Evidence-based assessment of opportunities, risks and impacts of artificial intelligence. United Nations. Available at: https://www.un.org/independent-international-scientific-panel-ai/sites/default/files/2026-07/en_Preliminary%20Report_.pdf [Accessed 2 July 2026].

UN News, 2026. ‘The science is here’: UN chief welcomes first global AI assessment. United Nations. Available at: https://news.un.org/en/story/2026/07/1167853 [Accessed 2 July 2026].

United Nations Office for Digital and Emerging Technologies, 2026. Everyone has an opinion on AI. Now there’s an agreed set of facts [LinkedIn post]. Available at: https://www.linkedin.com/company/unodet/posts [Accessed 2 July 2026].