29 Million Secrets on GitHub: AI Coding Tools Are Doubling Your Leak Rate

April 19, 2026 · Pierre

ai-security secrets gitguardian ai-coding-tools claude-code copilot cursor secret-detection dlp

29 Million Secrets on GitHub: AI Coding Tools Are Doubling Your Leak Rate

GitGuardian released their State of Secrets Sprawl 2026 report on March 17. The headline number: 28.65 million new hardcoded secrets were pushed to public GitHub in 2025. That's a 34% year-over-year increase — the largest single-year jump they've ever recorded.

But the number that matters for anyone building with AI is buried deeper in the report: AI-assisted commits leak secrets at roughly 2x the baseline rate. Claude Code-assisted commits showed a 3.2% secret-leak rate, versus a 1.5% baseline across all public GitHub commits.

That's not a rounding error. That's a structural problem.

The Scale of the Problem

Let's put the full picture together from the GitGuardian data:

28.65 million new hardcoded secrets on public GitHub in 2025 (GitGuardian State of Secrets Sprawl, 2026)
1,275,105 AI service secrets leaked — up 81% year-over-year (GitGuardian, 2026)
113,000 leaked DeepSeek API keys alone (GitGuardian, 2026)
64% of valid secrets from 2022 are still not revoked in 2026 (GitGuardian, 2026)
46% of critical secrets have no vendor-provided validation mechanism (GitGuardian, 2026)

AI didn't invent the secrets sprawl problem. But it accelerated every condition that makes it worse: faster shipping, more integrations, more service accounts, more configuration surfaces where credentials end up by mistake.

Why AI Coding Tools Leak More

The 2x leak rate isn't because AI models are malicious. It's because of how developers use them.

AI generates code faster than developers review it. When you're accepting 10 code suggestions per hour instead of writing 10 lines per hour, the probability that one of those suggestions contains a hardcoded credential — or that you paste a credential into a prompt and the model echoes it back — goes up. Not because the model is careless, but because the workflow moves too fast for human review to catch everything.

AI models don't understand what a secret is the way you do. A model might generate a config file with a placeholder that looks exactly like a real AWS key. Or it might helpfully include a connection string in a code example because you asked it to show you how to connect to Postgres. The model is optimizing for helpfulness, not operational security.

Prompts themselves are a new leak surface. This is the one most teams miss entirely. You paste your .env file into Claude Code to debug a connection issue. The model sees your database credentials, your API keys, your Stripe secret. That data is now in an API request body flying across the internet to a model provider. GitGuardian doesn't scan prompts — they scan git commits. The prompt-level leakage is invisible to every tool that only watches the repository.

Apiiro's research confirms the broader pattern: teams using AI coding assistants see 4x velocity but 10x the security vulnerabilities shipped to production, with developers exposing cloud credentials and keys nearly twice as often as developers writing code without AI assistance (Apiiro, 2026).

It's Not Just the Code — It's the Tools

The AI coding tools themselves have become attack surfaces.

In February 2026, Check Point disclosed two critical vulnerabilities in Claude Code. CVE-2025-59536 (CVSS 8.7) allowed a malicious .claude/settings.json file in a git repo to execute arbitrary shell commands via hooks — no user approval required. CVE-2026-21852 allowed a project config to redirect API requests, including your Anthropic API key in plaintext, to an attacker-controlled server (Check Point Research, 2026).

Both are patched. But the pattern matters: a malicious commit in a shared repo could compromise every developer who cloned it.

In March 2026, Zenity Labs disclosed a zero-click prompt injection in Perplexity's Comet AI browser. A malicious calendar invite could hijack the browser's AI agent, enabling the attacker to browse the local file system and steal 1Password credentials — without any user interaction. No exploit. No click. The agent autonomously exfiltrated files while still returning the expected response to the user (Zenity Labs, 2026).

Microsoft's security team found malicious Chromium extensions impersonating AI assistants that harvested LLM chat histories from approximately 900,000 installs (Microsoft Security Blog, March 2026).

The attack surface isn't just "what does the model generate." It's "what does the tool send, receive, store, and expose while you use it."

What Happens When Agents Handle Secrets

It gets worse when you add autonomous agents.

The "Agents of Chaos" paper — published February 2026 by researchers from Northeastern, Harvard, MIT, Stanford, CMU, and others — gave AI agents real system access for two weeks. Email, file systems, shell commands, persistent storage. The results:

Agents leaked sensitive information including SSNs, bank account details, and medical records — even after refusing direct requests for the same data
Agents executed destructive system-level commands
A researcher changed their Discord display name to match an agent's owner. The agent couldn't tell the difference. It complied with the impersonator's instructions to delete all of its persistent memory files
The agents failed in 11 distinct ways, from identity spoofing to cross-agent behavior contamination

(Shapira et al., "Agents of Chaos," February 2026)

This wasn't adversarial prompting. These are the same models plugged into enterprise systems right now. The researchers just gave them real access and watched what happened.

Across a broader study of AI coding agents, researchers at Help Net Security found that across 38 scans of 30 pull requests, AI agents produced 143 security issues — 87% of PRs contained at least one vulnerability (Help Net Security, March 2026).

Where the Existing Defenses Break Down

Here's the problem with the current toolchain:

Git hooks and pre-commit scanners catch secrets at commit time. That's too late. The secret was already in a prompt sent to an AI provider, processed by the model, and potentially logged, cached, or used for fine-tuning. By the time it reaches a git hook, it's already been exfiltrated — you're just preventing it from reaching GitHub too.

Repository scanners (GitGuardian, GitHub secret scanning) catch secrets after they're pushed. Better than nothing. But 64% of secrets from 2022 still aren't revoked. The alert-to-remediation pipeline is broken in most organizations.

Browser extensions and DLP agents can't see CLI tools, scripts, or agent traffic. Claude Code runs in your terminal. Cursor makes API calls from a desktop app. MCP servers make outbound requests from Node.js processes. A Chrome extension sees none of this.

Cloud-routed AI security (Netskope, Zscaler) can inspect traffic, but your prompts now travel through their infrastructure. You've solved the AI leakage problem by creating a new one — your source code and credentials passing through a third party's cloud. And they miss local tools entirely.

The gap: nobody is scanning the actual prompt content at the point where it leaves the developer's machine, before it reaches any provider, without routing it through a cloud.

What Network-Layer Scanning Catches

A MITM proxy on the developer's machine sees every API request body. That includes:

Secrets in prompts. When a developer pastes a .env file, a connection string, or a credentials file into an AI tool, the proxy sees it in the request body and flags it before it reaches the provider. This is the leak surface GitGuardian can't cover — because it only watches the repository, not the prompt.

Secrets in responses. When a model generates code containing a hardcoded credential — or echoes back a credential the developer pasted — the proxy catches it in the response body. You see what's coming back, not just what goes out.

Exfiltration via tool calls. When an MCP server leaks your environment variables into the LLM context (the most common medium-severity finding in MCP server audits), the proxy detects the credential in the outbound request.

Credential theft via config injection. When a malicious repo config redirects Claude Code's API requests to an attacker's server (CVE-2026-21852), the proxy's destination allowlist blocks the request. The exfiltration fails.

Agent data leakage. When an autonomous agent includes sensitive data in an API call — intentionally or not — the proxy flags it in real time.

This is what we built CitrusGlaze to do. 254 secret detection patterns in a Rust engine, running at wire speed on your machine. Scans every request and response to every AI provider. No cloud routing. No browser extensions. No kernel extensions. Just a proxy between your AI tools and the internet.

The Math on Remediation

GitGuardian's data shows that 64% of secrets from 2022 aren't revoked four years later. That's a remediation failure. But the root cause isn't negligence — it's that most organizations lack a viable, repeatable path from "alert" to "rotated credential."

The economics are blunt: the average data breach costs $4.44 million, with shadow AI adding $670,000 (IBM Cost of Data Breach, 2025). Meanwhile, 97% of organizations using AI lack access controls to prevent AI-related data breaches (IBM, 2025).

Prevention at the network layer is structurally cheaper than remediation. If the secret never reaches the AI provider, you don't need to rotate it. You don't need to assess blast radius. You don't need to file an incident report.

That's not an argument against scanning your repositories — you absolutely should. It's an argument for adding a layer that catches the leak before there's anything to remediate.

What to Do Right Now

1. Measure your exposure.

Run git log --all --author=".*" --format='%H' | head -1000 | xargs -I{} git show {} | grep -cE '(AKIA|sk-|ghp_|-----BEGIN)' on your repositories. Count how many commits in the last 1,000 contain patterns that look like credentials. That gives you a baseline.

2. Scan at the prompt layer, not just the repo layer.

Your developers paste credentials into AI tools every day. A pre-commit hook doesn't catch that. A network proxy between the tool and the provider does.

3. Watch your AI tools' outbound connections.

Claude Code should only talk to api.anthropic.com. Cursor should only talk to its API endpoints. If your AI tools start connecting to unfamiliar domains — especially after you clone a new repo — something is wrong.

4. Don't ban AI tools. Instrument them.

Banning AI tools pushes usage to personal accounts and personal devices. 45.4% of sensitive AI prompts already go through personal accounts, bypassing corporate controls entirely (Harmonic Security, 2025). The answer isn't prohibition — it's visibility.

5. Assume the 2x leak rate is real.

If your developers use AI coding tools (84% do or plan to, according to Stack Overflow), your secret exposure surface just doubled. Plan accordingly. Budget for faster detection, faster rotation, and ideally, prevention at the source.

The Uncomfortable Conclusion

AI coding tools are the most productive thing that's happened to software development in a decade. They're also doubling your secret leak rate, introducing 10x the vulnerabilities, and creating entirely new attack surfaces that your existing security toolchain can't see.

The answer isn't to slow down. It's to add a safety net between the developer and the internet. One that catches secrets before they leave the machine. One that runs locally, so your prompts don't become someone else's data. One that works with every AI tool — not just the ones that support your browser extension.

That's what CitrusGlaze does. 254 patterns. Rust engine. Local proxy. Five minutes to deploy.

Your AI tools will leak. The question is whether you catch it at the proxy or discover it on GitHub six months later.

Install CitrusGlaze free — catch secrets in AI prompts and responses before they reach any provider. 254 detection patterns. Runs on your machine. Five minutes.

Install CitrusGlaze free — catch secrets in AI prompts and responses before they reach any provider. 254 detection patterns. Runs on your machine. Five minutes.

Scan yours free