The backbone of Matt’s workflow is a loop that starts with planning rather than editing. In the “plan mode” flow, the agent is constrained to explore and propose before touching files: you plan, the agent executes, you verify with tests or QA, you commit, and then repeat. This is not just about reducing mistakes. Matt argues planning helps in two directions:
In practice, that means fewer “please just fix it” prompts and more explicit, inspectable artifacts that you can approve. A broader system: seven phases from idea to QA |
Zooming out, Matt describes a 7-phase framework for “AI-driven development”:
- Idea
- Research
- Prototype
- PRD (spec)
- Implementation planning
- Execution
- QA
Download your cheatsheet to test out Matt's method
| |
It all starts with an Idea, which can range from a massive app concept to a tiny refactor.
If your project involves tricky external APIs or difficult exploration, you move into the Research phase to cache those findings in an asset like a research.md file.
Next, Prototyping allows you to iterate on different UI or architectural approaches to ensure the outcome matches your personal taste before committing to the main codebase.
Once the vision is clear, you craft a PRD (Product Requirements Document) to define the destination, often prompting an AI agent to “grill” your design decisions to hammer out the details.
This document is then broken down into an Implementation Plan, where you organize work into a Kanban board with clearly defined blocking relationships.
The Execution phase follows, where coding agents - like a Ralph loop - resolve tickets, often allowing you to run the process AFK while the heavy lifting is done.
Finally, the QA phase involves an agent-generated plan for a human to review the results, which often leads back to more tickets.
This is meant as a mental model for shipping with agents, where you iterate the last steps (execution and QA) until the result is solid.
Two details are especially practical:
- Research can be cached into an artifact the agent can reliably refer back to (to avoid repeatedly re-discovering an external API or domain knowledge each session).
- Prototypes are a taste-imposition tool: get multiple options in a throwaway context, pick the best, then commit the direction so the agent has a concrete target.
“Skills” over prompts: turning workflows into reusable tools
Instead of relying on ad-hoc prompting, Matt packages workflows as agent skills: short, named instructions that encode how the agent should behave in specific stages. He has tonnes of them now and shares them free on his GitHub https://github.com/mattpocock/skills.
A few examples Matt shares:
/grill-meto force a deep interview and walk the “design tree” until shared understanding./write-a-prdto turn a conversation into a document (including repo exploration and user stories)./prd-to-issuesto translate the “destination” (PRD) into a “journey” (a kanban board of tasks with dependencies)./tddto encourage red-green-refactor, and to structure work so failures are caught early.
I've found great value in the grill me loops. They force both you and the model to fully understand the issue and its edge cases. This has become significantly more effective with sub agents.
The meta-point is that a “skill” does not need to be long to be useful. It needs to be specific enough to put the agent on rails. It's almost like creating a bash alias for commonly run commands.
How he deals with hallucinations and review
Matt’s material consistently pushes toward verification loops:
- Plan first, so the agent must explore before acting.
- Use tests or type checks during verification steps.
- Use TDD as an execution constraint to keep the agent grounded in observable behavior.
- Run a QA phase after execution, and loop back with more tickets as needed.
In other words, hallucinations are handled less by “trusting the model less” and more by structuring the work so incorrectness is expensive and visible. These are age old good engineering practices.
Context management: make the repo legible to machines
Finally, Matt also talks about making the codebase itself more “agent-friendly.” In the skills write-up, the argument is blunt: if the codebase is garbage, the AI will output garbage within it, and architecture improvements raise the ceiling on agent output.
One concrete example of “legibility” is encoding conventions in Cursor rules. For TypeScript, Matt suggests declaring return types for top-level functions to help future AI assistants infer a function’s purpose.
Predictions and hot takes
- Process will matter more than models. As tools get faster, the advantage shifts toward people who can structure work into reliable loops and artifacts (plans, PRDs, tests, QA checklists), rather than one-off prompts.
- “Not vibe coding” is a stance. The workflow assumes engineering fundamentals (planning, tickets, review, testing) remain essential, even if execution is partially automated.
- Architecture becomes an AI performance lever. Improving boundaries and interfaces is not just for humans, it’s for agents too.
Practical takeaway
If you want to copy Matt Pocock’s approach, the move is to stop asking, “How do I get the model to write better code?” and start asking, “What process makes wrong code hard to ship?” Then teach that process to the agent.
Download your cheatsheet to test out Matt's method
| |
|