Claude Code Has '/goals' to Stop Agents Quitting Early. The Fix Is Your Backend.
Anthropic's new /goals command stops Claude Code from declaring victory too soon. But the deeper problem isn't prompting. It's that most agent pipelines have no durable execution layer to enforce that tasks actually

Every builder who has handed a real task to an AI agent knows the feeling. The logs look clean. The agent reports success. Then you check the actual output and half the files are missing, the tests never ran, or the migration skipped the largest table in your database. The model did not fail. It simply quit before the work actually ended.
Anthropic sees this happening in production pipelines and is trying to patch it from the client side. The company just rolled out a /goals command inside Claude Code that forces the agent to write down what it intends to accomplish before it starts typing code. The idea is to create a separation between the agent that works and the agent that decides it is done. If the written goal does not match the outcome, the human catches the gap.
It is a smart patch. But still a patch. The underlying problem is that large language models are optimistic by default. They generate confident summaries. They tick boxes that were never ticked. No amount of prompting removes that tendency entirely because the model does not have eyes on the real state of your repository, your database, or your deployment target.
Why Prompts Can't Enforce Completion
When you rely on /goals or any other prompt-level guardrail, you are asking the same unreliable narrator to police itself. The model writes the goal, does the work, and then grades its own homework. In some cases that works. In production, it fails precisely when you need it most: during long jobs that touch many files, when context windows stretch thin, or when the model confuses a planned step with a completed one.
What you need instead is an execution layer that lives outside the model. A layer that knows the migration requires twelve tables, that tracks which ones are done, and that retries step seven if the compiler throws an error. The model should generate the code. Something else should enforce that the code lands, runs, and passes.
The Fix Is Workflow, Not Wishes
This is where durable workflows matter. A proper workflow engine treats each step as a unit of work with explicit inputs, outputs, and retry logic. The state persists to a database. If the process restarts, it resumes where it left off. It does not trust a chat summary to determine whether the job is complete. It checks the actual world: did the file get written? Did the test exit with code zero? Did the deployment health check return 200?
This idea is core to Convex, the backend behind Botflow. Durable workflows are native, not bolted on. You define steps in code, and the engine guarantees they execute or retry according to rules you set. Your frontend gets real-time updates through reactive queries, so you can watch an agent's progress live instead of refreshing a log file and hoping.
For founders and indie hackers shipping real products, this changes how you build. You can stop praying that a forty-file refactor does not lose steam after the twentieth file. Instead, you structure the work as a workflow. The LLM generates within each step. The workflow engine owns the sequence, the state, and the definition of done.
Botflow's entire stack is built on this separation. Vibe coding gets you started fast, but shipping requires infrastructure that does not hallucinate. When your backend enforces completion, your agents can run overnight. They can touch hundreds of files. And when they say they are done, you can actually believe them.