What makes AI agent security different

When a human logs into a system and performs a task, there's a natural pause between intent and action. Security teams have built their tooling around that model — log in, do something, log out. Anomalies surface in SIEM. Access is provisioned through IAM. Compliance is documented after the fact.

AI agents don't work that way. A single agent invocation can trigger hundreds of tool calls in seconds — reading files, querying databases, sending messages, calling APIs — with no human review between steps. An agent with write access to a database and read access to email can exfiltrate an entire customer list in a workflow that completes before a human analyst even opens an alert.

The threat model is fundamentally different. The authorization model has to be too.

The five primary agent threat categories

Empirical red-team research on production-deployed AI agents has identified five attack categories that traditional security tools cannot address:

Prompt injection via tool results

Attackers embed imperative commands in web pages, documents, or API responses. The agent fetches the content, incorporates it as context, and executes the injected instructions — often with no audit trail.

▸ Requires inline inbound scanning

Blast radius violations

A permitted action executed at 1,000× normal scale is still technically permitted. An email agent authorized to send messages can broadcast to 10,000 recipients without any per-action scope check.

▸ Requires pre-execution policy

Trust reset attacks

An agent refuses a delete action in Session 1. The attacker opens Session 2 on a new channel. Without cross-session memory, the agent complies — the prior refusal is invisible.

▸ Requires cross-session correlation

Multi-agent pipeline poisoning

In a three-agent pipeline, a compromised sub-agent issues out-of-scope delegations to the next agent. Without a permission ceiling, scope creep propagates through the entire workflow.

▸ Requires delegation chain enforcement
Empirical source

Shapira et al., "Agents of Chaos" (arXiv:2602.20021, Feb 2026) — tested 15 real attack scenarios against production-deployed autonomous agents across multiple frameworks. All five threat categories above were successfully exploited. None were prevented by existing tooling.

Why observation-only tools aren't enough

The AI security market is full of tools that observe what agents do and report on it. These tools have real value — audit trails, anomaly surfacing, compliance reporting. But they share a fundamental architectural limitation:

The Attestation Principle

Any entity that can act cannot independently attest to its own behavior. A tool downstream of execution can only see what the agent already decided to emit. Independent inline authorization is not a product preference — it is a logical requirement.

The table below maps the five threat categories to what different tool types can actually address:

Threat SIEM / Observability LLM Guardrails MCP Gateway Behavry (inline)
Prompt injection via tool resultsDetect afterPartial (model-side)NoBlock before context
Blast radius violationsDetect afterNoNoBlock pre-execution
Trust reset attacksAlert afterNoNoDetect + block
Multi-agent pipeline poisoningNo visibilityNoNoDelegation ceiling
Sensitive data exfiltrationDetect afterPartialNo26-pattern DLP, pre-OPA

What inline authorization looks like

Behavry sits as an inline authorizer between AI agents and the systems they access via Model Context Protocol (MCP). Every tool call passes through the authorization stack before reaching its target. No agent code changes required — agents point their MCP configuration at the Behavry authorizer.

Per-agent identity

Every agent receives a unique JWT RS256 credential with short-lived token lifetime. No shared API keys. Every tool call is cryptographically attributed to a specific agent instance, session, and requester channel before any policy evaluation begins.

Pre-execution policy enforcement

OPA Rego policies evaluate every tool call against per-agent rules before the request is forwarded. Policies encode intent, scope, resource boundaries, blast radius thresholds, and requester identity requirements — not just connection rules.

Input scanning — outbound and inbound

26 sensitive data patterns are scanned before policy evaluation (outbound). After tool calls return, response bodies are scanned for injected instructions before results reach agent context (inbound). AWS credentials, GitHub tokens, SSNs, credit cards, PEM keys, and 20 more patterns — with cross-session fragment reassembly detection.

Behavioral baselining

Rolling per-agent baselines detect frequency spikes, novel resource access, error rate changes, data volume anomalies, and gradual risk score drift. Cross-session memory detects trust reset attacks that single-session monitoring misses entirely.

Decision Trace

Every action in a multi-agent pipeline is linked by parent event ID, causal depth, workflow session, and delegation chain. The Decision Trace is a causal chain of custody artifact — not a log. It is a proof. It can only be produced from an inline execution-path position.

Compliance and framework alignment

Regulatory frameworks are increasingly specific about what AI agent security controls must look like:

  • EU AI Act (Art. 9, 13, 14) — risk management systems, transparency, logging, and human oversight for high-risk AI systems. Enforcement is live.
  • NIST AI RMF — agent identity and authorization (MG-2.1), human oversight of AI decisions (GO-1.7), behavioral monitoring against baselines (MG-2.2).
  • SOC 2 (CC6.1, CC6.7, CC7.2–7.4) — access control enforcement, sensitive data controls, anomaly detection, and incident response.
  • GDPR Art. 32 / HIPAA §164.312 — technical safeguards for systems processing regulated data. AI agents with access to PII or PHI are in scope.

Behavry maps directly to all four frameworks with live controls — not documentation and not dashboard screenshots of what happened after the fact.

Ready to authorize
AI agent actions?

Behavry deploys in a day. No agent code changes. Authorization enforced from the first tool call.