Nov 18, 2025 - 10 Min read

The Missing Primitives for Trustworthy AI Agents

This is another installment of our ongoing series on building trustworthy AI Agents:

Kill Switches and Circuit Breakers (Part 6)

With the earlier primitives we secured communication, protected agent reasoning, established identity, enforced policy, and created verifiable logs. These controls provide strong guarantees, but none of them stop an agent once it begins acting in harmful or unexpected ways. Autonomous agents require a final class of safety mechanisms: runtime stop controls.

Kill switches and circuit breakers exist to prevent worst case scenarios. They stop runaway loops, halt repeated expensive operations, contain failures, and give operators the ability to pause all actions if needed. These controls operate outside the agent itself, preventing the agent from ignoring or bypassing them.

In this part we explore five safety primitives and show how to combine them into a unified runtime supervisor. All examples use per agent state, since agents must be isolated from one another.

Primitive 1: Agent Level Kill Switch

An agent level kill switch is the simplest form of hard control. It is a boolean flag that determines whether a specific agent is allowed to take any action. The switch must be stored externally so the agent cannot modify it.

Different storage choices have trade-offs:

Redis provides very fast, low latency checks. Ideal for agents making many small actions.
Feature flag systems provide operator friendly UIs, change management, and audit trails.
DynamoDB or PostgreSQL provide durable persistence with strong consistency.
SPIRE or OPA metadata makes the kill switch part of identity or policy configuration.

Every agent action should consult the kill switch before running. This gives operators a simple emergency stop mechanism that works even if the agent is misbehaving.

Example: Redis backed kill switch

Before the code, here is the explanation. This example shows how to check an agent’s kill switch inside Redis. The switch is evaluated before any action, and the action is only executed if the agent is currently enabled.

import redis

# Connect to Redis (local example)
r = redis.Redis(host="localhost", port=6379)

def is_agent_enabled(agent_id: str) -> bool:
    """
    Return True if the agent is enabled.

    We store the switch in Redis as:
      agent:<agent_id>:enabled = "true" or "false"

    Missing keys default to enabled.
    """
    key = f"agent:{agent_id}:enabled"
    value = r.get(key)

    if value is None:
        return True  # Default: enabled
    return value.decode("utf-8") == "true"

If an operator runs:

redis-cli set agent:spiffe://trust.local/agent/data-exporter:enabled false

Every action by that agent will now fail immediately.

Primitive 2: Action Level Circuit Breakers

A kill switch disables an agent entirely. A circuit breaker is more targeted. It limits how frequently a specific action can occur. This prevents runaway loops or retry storms.

This is especially important for:

expensive operations (large queries, file writes, image generation)
operations that can impact downstream services
calls that should only occur occasionally
noisy failure patterns that cause retry loops

Circuit breakers were popularized in microservices. The same pattern applies to agents because both deal with autonomous systems that can overwhelm dependencies.

Example: Token bucket rate limiting

The code below implements a classic token bucket algorithm. An action is allowed only if the bucket for that agent has tokens available. Each agent has its own bucket, preventing cross agent interference.

import time

class TokenBucket:
    def __init__(self, rate, capacity):
        """
        rate = tokens added per second
        capacity = maximum number of tokens allowed
        """
        self.rate = rate
        self.capacity = capacity
        self.tokens = capacity  # Start full
        self.last = time.time()  # Timestamp of last refill

    def allow(self) -> bool:
        """
        Returns True if the action is allowed.

        We refill tokens based on time elapsed since the last check.
        If at least one token is available, we consume it.
        """
        now = time.time()
        delta = now - self.last

        # Refill tokens (delta * rate) but do not exceed capacity
        self.tokens = min(
            self.capacity,
            self.tokens + delta * self.rate
        )

        self.last = now

        # Do we have a token to spend?
        if self.tokens >= 1:
            self.tokens -= 1  # Spend token
            return True

        # No tokens remaining
        return False

This prevents excessive calls while preserving natural bursts of activity.

Primitive 3: Objective Based Circuit Breakers

Rate limits detect frequency. Objective based breakers detect patterns.

Agents often behave in loops that do not violate rate limits. For example:

running the same query over and over
writing the same file repeatedly
generating plans in a recursive cycle
hallucinating a chain of actions that escalates gradually

Rate limits would not stop these behaviors because the calls are spaced out. A pattern breaker examines what the agent is doing, not just how often.

Example: per agent sliding window pattern detection

Below, each agent has its own history deque. We check the last few seconds of actions for repeated behavior.

from collections import deque
import time

def should_trip(agent_history: deque, agent_id: str, action: str) -> bool:
    """
    Detect suspicious repeating patterns for a single agent.

    agent_history stores tuples of:
      (agent_id, action_name, timestamp)

    We add the new action, then check how many similar actions
    occurred recently. If too many, we trip the breaker.
    """
    now = time.time()

    # Record this action in the agent's personal history
    agent_history.append((agent_id, action, now))

    # Keep only actions in the last 2 seconds
    recent = [x for x in agent_history if now - x[2] < 2]

    # Count how many of those actions match this one
    repeated = [x for x in recent if x[1] == action]

    # More than 5 identical actions in 2 seconds is suspicious
    return len(repeated) > 5

With this primitive, slow loops are detected just as reliably as fast ones.

Primitive 4: Policy Level Hard Stops

Policy rules catch conditions that simple circuit breakers cannot. Instead of looking at frequency or patterns, a policy can inspect the semantic meaning of a request.

Policies can enforce:

maximum file size
restricted actions outside office hours
regional data boundaries
dataset classifications
action budgets
business risk categories

OPA and Rego enable declarative policy enforcement at runtime.

Example Rego snippet

package ai.circuit

deny[msg] {
    input.action == "s3.put_object"
    input.payload.size_mb > 500
    msg := "Object too large to export"
}

deny[msg] {
    input.action_count > 100
    msg := "Daily action budget exceeded"
}

These rules integrate seamlessly with the supervisor.

Primitive 5: System Level Kill Switch

This is the global brake pedal. It halts:

all agents in a trust domain
all agents of a specific type
all agents running in a region
all agent to agent communication

This switch is independent of agent logic. It must be hard to bypass and fast to operate.

Why revoking SPIFFE identity is so effective

In Part 3 we introduced SPIFFE SVIDs as cryptographic identities. When you revoke an identity:

the agent can no longer obtain fresh certificates
existing certificates expire quickly
mTLS authentication fails
all downstream calls are rejected
all peer agents refuse communication

This is a cryptographic shutdown. Agents cannot restart themselves or reauthenticate.

Example

spire-server entry delete -entryID <agent-entry-id>

Within seconds, the agent is contained.

Bringing It All Together: Combined Runtime Supervisor

To apply all five primitives in a real system, you need a runtime supervisor that checks kill switches, circuit breakers, patterns, and policy before executing any action.

A robust supervisor must also manage state. Each agent needs its own circuit breaker and history. A global bucket or global history would allow one noisy agent to affect all others. We use defaultdict to automatically maintain per agent state.

Combined supervisor example

The following code shows how these primitives work together. It includes:

per agent TokenBuckets
per agent action histories
kill switch checks
pattern detection
OPA policy checks
audit logging

This example is fully runnable, but is not production ready.

import time
from collections import deque, defaultdict

# --- Primitives (previous sections) ---

# Primitive 1: Kill switch (stub)
def is_agent_enabled(agent_id: str) -> bool:
    """
    In production this queries Redis or a feature flag system.
    Here we assume all agents are enabled.
    """
    print(f"[Check 1] Checking kill switch for {agent_id}...")
    return True


# Primitive 2: Token bucket (per agent)
class TokenBucket:
    def __init__(self, rate, capacity):
        self.rate = rate          # tokens added per second
        self.capacity = capacity  # maximum number of tokens
        self.tokens = capacity    # start with full bucket
        self.last = time.time()   # last refill timestamp
    
    def allow(self) -> bool:
        """
        Return True if token is available.

        Refill tokens based on elapsed time, capped at capacity.
        """
        now = time.time()
        delta = now - self.last

        # Refill tokens
        new_tokens = delta * self.rate
        self.tokens = min(self.capacity, self.tokens + new_tokens)

        self.last = now

        # Do we have a token to spend?
        if self.tokens >= 1:
            self.tokens -= 1
            return True

        return False


# Primitive 3: Pattern detection
def should_trip(agent_history: deque, agent_id: str, action: str) -> bool:
    """
    Check if this specific agent is entering a suspicious repeated pattern.
    """
    now = time.time()

    # Record this action
    agent_history.append((agent_id, action, now))

    # Keep actions within the last 2 seconds
    recent = [x for x in agent_history if now - x[2] < 2]

    # Count how many of those were the same action
    repeated = [x for x in recent if x[1] == action]

    return len(repeated) > 5


# Primitive 4: Policy (stub)
def opa_allows(agent_id: str, action: str, payload: dict) -> bool:
    """
    In production this sends input to OPA and evaluates policy.
    Here we simply allow all actions.
    """
    print(f"[Check 4] Checking OPA policy for {agent_id}:{action}...")
    return True


# Primitive 5: Audit logging (stub)
def emit_audit_event(agent_id: str, action: str, payload: dict):
    """
    In production this sends structured logs to a tamper proof audit sidecar.
    """
    print(f"AUDIT: {agent_id} performed {action} with {payload}")


# --- Per agent state ---

# Automatically create a new token bucket for each agent
agent_buckets = defaultdict(lambda: TokenBucket(rate=1, capacity=5))

# Automatically create a new sliding window history for each agent
agent_histories = defaultdict(lambda: deque(maxlen=10))


# --- Supervisor ---

def supervisor(agent_id: str, action: str, payload: dict, fn):
    """
    Centralized enforcement for all agent actions.

    This function ensures each action passes through:
      1. Kill switch
      2. Circuit breaker
      3. Pattern detection
      4. Policy evaluation
      5. Audit logging
    """

    # 1. Kill switch
    if not is_agent_enabled(agent_id):
        raise RuntimeError(f"Agent {agent_id} disabled by kill switch")

    # 2. Circuit breaker (per agent bucket)
    if not agent_buckets[agent_id].allow():
        raise RuntimeError(f"Circuit breaker open for {agent_id}")

    # 3. Pattern detection (per agent history)
    if should_trip(agent_histories[agent_id], agent_id, action):
        raise RuntimeError(f"Suspicious repeating pattern for {agent_id}")

    # 4. Policy
    if not opa_allows(agent_id, action, payload):
        raise RuntimeError(f"Policy violation for {agent_id}:{action}")

    # 5. Audit logging
    emit_audit_event(agent_id, action, payload)

    # Execute the action
    print(f"SUCCESS: Executing {action} for {agent_id}")
    return fn()


# --- Demo ---

def my_action():
    return "work completed"

payload = {"dataset": "customers"}

print("--- Testing Agent A ---")
for i in range(7):
    try:
        time.sleep(0.1)
        supervisor("agent-A", "db.query", payload, my_action)
    except RuntimeError as e:
        print(f"Call {i}: FAILED {e}\n")

print("\n--- Testing Agent B (independent state) ---")
try:
    supervisor("agent-B", "db.query", payload, my_action)
except RuntimeError as e:
    print(f"FAILED: {e}")

This supervisor combines all primitives safely and isolates each agent’s state.

Why This Matters

As autonomy increases, so does risk. Autonomous agents act quickly, operate across systems, and can enter feedback loops that traditional monitoring will not catch in time. Trustworthy agent systems require infrastructure level safety controls that do not rely on the agent to behave well.

Kill switches and circuit breakers provide that control. They offer predictable containment, fast response during incidents, and guardrails that operate independently of the agent’s internal logic.

Practical Next Steps

Add a kill switch key for each agent in your environment.
Introduce per agent circuit breakers for high cost actions.
Implement objective based breakers for repetitive patterns.
Add policy rules that define hard stop conditions.
Add multi region kill switch bindings in SPIRE or IAM.
Add visibility into kill switch state and breaker states in an operator dashboard.

Part 8 will cover Adversarial Robustness as models need to withstand data poisoning, prompt injection, and inversion attacks.

Trustworthy AI Agents: Verifiable Audit Logs

Trustworthy AI Agents: Kill Switches and Circuit Breakers