All posts

Claude Agents Can Finally Stop Leaking Credentials. The Real Work Is Still Your Backend.

Anthropic fixed Claude's credential leak with sandboxes and MCP tunnels. That is huge for production AI, but security is only the start. Your backend needs to execute tool calls safely and reactively

May 20, 20263 min read
Heavy black punk-zine style illustration of a stamp-machine workflow pushing AI agents and tool-call blocks through a locked backend chamber, with leaking credential scraps blocked

Every demo looks great until you try to ship it to production. The problem with AI agents is not that they cannot write code or call APIs. It is that they walk around with your API keys in their pocket. If the agent goes rogue or gets confused, your credentials go with it.

Anthropic is fixing that today with two new tools for Claude Managed Agents. The first is a self-hosted sandbox that keeps tool execution inside your own infrastructure perimeter. The second is an MCP tunnel that connects the agent to private MCP servers without ever handing over the actual credentials. The agent gets to use the tool, but it never holds the keys.

Until now, most agent setups required passing tokens along with every tool call. That works fine for a weekend prototype. It is a non-starter for a healthcare app or a fintech dashboard handling real money. Compliance teams and security engineers have been the ones slamming the brakes.

What Secure Actually Looks Like in Production

Hiding keys is only half the battle. You also need to control where the code runs. Anthropic's sandbox lets teams keep execution local or inside their VPC. The agent gets a tunnel, not a keychain. That is a real architectural shift, not a permissions patch.

But the agent still needs to do something. It still needs to read from a database, write to a CRM, or charge a credit card. Someone has to build those endpoints, host them, handle concurrency, and make sure the database does not lock up when three agents hit it at once.

Build for the Agent, Not the Demo

This is the part the announcement does not cover. A secure sandbox is useless if the backend inside it is brittle. Most vibe-coded apps start with a thin layer of serverless functions and a generic database. They work for demos. They crack when an agent starts looping, retries a payment, or fires off ten queries in a second because the LLM got eager.

You need a backend that is actually built for AI. That means reactive queries that push updates live, durable workflows that survive a crash mid-task, and vector search built in so the agent has context without ferrying data back and forth across services. That is the difference between a cool agent demo and a product people pay for.

When your agent calls a tool through that MCP tunnel, it needs something on the other side that can execute reliably. That backend is your responsibility. It needs to handle concurrency, survive crashes, and return results fast enough that the agent does not time out and hallucinate a workaround. The credential fix gets the agent into your system. Your backend determines whether it ships a feature or breaks your database.

The builders who win this phase will not be the ones with the fanciest prompt engineering. They will be the ones who treat the agent as just another user of a solid API. Build the backend first. Give the agent a narrow, well-documented surface to call. Then let the LLM do what it does best while your infrastructure handles the rest.

It is tempting to ship the moment your Claude agent successfully books a calendar slot in a sandbox. But production means handling the edge case where the agent tries to book the same slot twice because the first call lagged. It means idempotency keys, transactional writes, and rate limiting. These are not LLM problems. They are software engineering problems.

Anthropic just removed the excuse that agents cannot be trusted with keys. Now the bottleneck is purely on execution. That is a good thing. It means the playing field shifts back to who can build the most reliable, responsive stack underneath the model. For founders and indie hackers, that is an opportunity to outship slower incumbents who are still treating AI like a chatbot layer.