Anthropic's 'Dreaming' Agents Learn From Mistakes. Here's What to Build.
Anthropic's new dreaming feature lets Claude agents learn from past sessions, turning one-off prompts into systems that improve with age. For builders, this changes how we think about agent memory and backend design

Anthropic unveiled a quiet but significant upgrade at its Code with Claude conference this week. The company calls it "dreaming," and it gives Claude Managed Agents something most AI tools lack. Memory. Not the token-limited context window kind. Actual memory that lets an agent review what went wrong in previous sessions and adjust before starting the next job.
Right now, most AI agents treat every task like a first date. They show up, try their best, then forget everything. If a prompt fails or a tool call breaks, the next run starts from zero. That is fine for a quick draft or a one-time script. It is a disaster for anything running in production where users expect consistency.
What "Dreaming" Actually Means
Anthropic is doing more than adding a database write at the end of a session. The dreaming system analyzes failures across past runs, identifies patterns, and updates how the agent approaches similar problems later. The company also moved two previously experimental features into public beta: outcomes tracking and multi-agent orchestration. Together, these turn Claude from a chatbot into something closer to a junior employee who actually gets better at their job.
Outcomes tracking means the agent knows whether it succeeded or flopped. Multi-agent orchestration lets several specialized agents hand off work without a human writing the routing logic. When combined with dreaming, you get a team of agents that argue less over time and finish tasks faster. That is the kind of infrastructure enterprise buyers have been waiting for before they trust agents with real workflows.
The Production Problem Most Builders Ignore
Every founder who ships an AI feature quickly learns the same lesson. Demoing a single successful run is easy. Keeping success rates above ninety percent across thousands of varied inputs is brutally hard. Context shifts. APIs change. Users phrase requests in ways your prompt never anticipated. Without a feedback loop, your agent slowly rots.
Dreaming attacks this directly. The agent becomes its own QA engineer, logging what worked and what broke, then tuning its approach. This matters because the gap between a cool prototype and a reliable product is where most AI startups die. Investors and customers are tired of demos. They want systems that stay up and improve.
What to Build With It
If you are building with Claude agents today, start by defining clear success and failure signals for every task. Dreaming needs labeled outcomes to learn. A vague "the user seemed happy" does not cut it. You need structured data. Did the API return a 200? Did the SQL query produce the expected schema? Did the multi-agent handoff complete without dropping context?
Multi-agent orchestration is where things get interesting. Instead of one giant prompt trying to do everything, split your app into specialist agents. One handles parsing. Another handles external API calls. A third validates output. Let dreaming smooth the edges where they connect. Over time, the system learns which agent works best for which input shape and routes accordingly.
This is where your backend choice starts to matter. Agents that remember and coordinate need durable state, real-time sync, and reliable job scheduling. A thin serverless function that spins up and dies is not enough. You need infrastructure that can persist session logs, retry failed workflows, and keep agents updated when shared state changes. That is exactly why we built Botflow on Convex. It handles the reactive database, durable workflows, and vector search so you can focus on what your agents do instead of how they remember.
The shift from stateless prompts to stateful, self-improving agents is happening faster than most expected. Anthropic just placed a big bet that memory and reflection are the next frontier. For indie hackers and small teams, this is good news. You do not need a Google-sized research budget to compete. You need a tight feedback loop, a clear definition of success, and a backend that will not lose the lessons your agents learn along the way.