Skip to Content
image description
image description

Insights Blog

Building a Spec-Driven Development Practice for Your Engineering Team

Copy LinkPrintEmailFacebookX
context engineering ai

Context engineering AI isn’t just a concept to understand. It’s a practice to build. Over the course of this series, we’ve covered the methodology (spec-driven development), the foundation (context engineering), the team multiplier (collaborative planning), and the quality gate (verification). Each piece matters on its own. Together, they form a complete system.

This post is the practical one. If you’ve been reading along and thinking “okay, but how do I actually start,” this is your answer.

Here’s a framework for building a spec-driven development practice on your team, starting with what you can do this week and scaling toward a fully integrated workflow.


The Complete Loop

Before diving into adoption levels, here’s the system in full. Every piece connects:

Plan. Build structured specifications before any code is generated. Capture the what (requirements), the why (architectural reasoning), and the how (constraints, conventions, patterns). Do this collaboratively so the team has shared visibility.

Export. Package the plan in a format your coding agent can consume. Structured markdown works with any agent. The plan becomes the handoff from human planning to agent execution.

Execute. Developers use whatever coding tool they prefer. Cursor, Claude Code, Copilot, Windsurf, or anything else. The planning layer doesn’t dictate the execution tool. Better plans make every agent more accurate.

Verify. After the agent delivers, compare the output against the original specification. Not “does it look right” but “was every specified item actually implemented.” Close the gaps. Iterate until done means done.

Most teams have fragments of this. Some write specs. Some do informal verification. Almost nobody runs the complete loop. The teams that do see dramatically different results: 3.2x higher first-attempt success rates on complex tasks, according to the research data that motivated this entire series.


Level 1: This Week

You don’t need new tools to start. You need a template and ten minutes of discipline before each complex task.

Create a planning template. A single markdown file with six sections:

Task understanding. What are we building? Why? What does success look like? One clear sentence for each.

Context gathering. What architecture patterns apply? What files and modules are affected? What dependencies exist? What team conventions should the agent follow? This is the section most teams skip, and it’s the section that produces the biggest improvement in agent output.

Approach evaluation. If there are multiple ways to solve the problem, list them with trade-offs. Pick one and explain why. This prevents the agent from making architectural decisions that should be made by humans.

Step decomposition. Break the work into ordered, atomic subtasks with clear dependencies. This gives the agent a roadmap instead of a destination.

Risk identification. What could go wrong? What edge cases matter? What integration points are fragile? Agents don’t anticipate risk well. Humans do. Put that anticipation in the plan.

Verification criteria. How will you know the implementation is correct? Specific, checkable items. These become your verification checklist after the agent delivers.

Use it on your next complex task. Before prompting your agent, spend ten minutes filling out the template. Hand the completed plan to the agent as context alongside your prompt. Compare the output quality to what you normally get.

The difference is usually obvious on the first try. The agent produces code that respects your architecture, follows your conventions, and handles the edge cases you identified. The rework cycle shrinks. The code review is faster because the reviewer can check against the plan instead of guessing at intent.

Measure the baseline. Before you change anything, note how your current process works. How many iteration cycles does a typical complex task take? How often does code review surface architectural issues? How much time do you spend re-explaining context to agents? These numbers become your before picture.


Level 2: Next Month

Once the planning template is working for individual tasks, expand to team-level infrastructure.

Build a shared context library. Three documents to start:

Architecture patterns. The top 3-5 architectural patterns your team uses, when each applies, and examples of correct usage. This prevents agents from choosing the wrong pattern for a given task.

Coding conventions. Language standards, naming conventions, file organization, testing expectations, error handling patterns. Everything a new developer (or a new agent) needs to know about how your team writes code.

Common pitfalls. The top 5-10 things that go wrong when agents work in your codebase. Why they happen. How to avoid them. This document is pure gold for agent context because it encodes the lessons your team has already learned the hard way.

Put these in a shared repo. Reference them in every planning template. When conventions change, update the library once and every future plan benefits.

Create plan templates by task type. Not every task needs a blank-page planning session. If you’re adding an API endpoint, 80% of the planning context is the same every time: which service layer to use, how to handle auth, what error format to follow, where tests go. Build templates for your most common task types (API endpoint, UI component, database migration, refactoring). The developer fills in the task-specific details. The template provides the reusable context.

Integrate verification into code review. Before a PR is reviewed, the developer runs their verification checklist (from the plan’s verification criteria section) against the implementation. The checklist results go into the PR description. The reviewer knows what was checked and can focus their review on judgment calls rather than mechanical verification.

This is where the practice starts to feel like a system rather than a set of individual habits. The shared context library means every developer’s agent receives the same foundational knowledge. The templates reduce planning overhead to minutes. The verification checklist catches gaps before they reach the reviewer.


Level 3: This Quarter

This is where context engineering AI becomes team infrastructure rather than individual discipline.

Adopt persistent context. Individual planning sessions suffer from context amnesia. Every new session, the developer re-explains the architecture, re-provides the conventions, re-describes the system state. Teams report spending 30-45% of their agent interaction time on context re-establishment.

Persistent context solves this. The team’s shared knowledge base carries forward across sessions. The architecture patterns, conventions, and service maps are always available. The context from last sprint’s planning informs this sprint’s work. New developers start with the same foundational context as veterans.

Add role-based visibility. Different team members need different views of the planning activity. A product manager needs to see what features are being planned and what acceptance criteria are defined. A senior developer needs to see what architectural decisions are being made across the team’s specs. An engineering manager needs aggregate views: how many plans are in progress, what areas of the codebase are being modified, what verification outcomes look like across the team.

Close the full loop. Plan collaboratively. Export to any agent. Execute with whatever tool the developer prefers. Verify systematically against the original specification. The four-step workflow runs as a continuous cycle, with each iteration improving the next: verification findings inform better specs, better specs produce fewer gaps, fewer gaps mean faster delivery.

This is where spec driven development AI becomes a team capability rather than an individual skill. The methodology scales because the infrastructure scales. And it scales independently of which coding agent your developers prefer, because the planning and verification layers are agent-agnostic by design.

At this level, you’re not just using AI coding agents. You’re operating a system that makes them dramatically more reliable. The agents haven’t changed. The infrastructure around them has.

Build your agentic context engineering practice. At Level 3, context engineering stops being a set of documents and becomes a living system. Context docs are updated as part of the PR process (change the code, update the context). Verification findings feed back into the context library (if agents consistently miss a pattern, add it to conventions). The team’s collective understanding of the codebase improves with every planning cycle, and that improvement compounds.

This is also where measurement becomes meaningful. Track context provision time (how long developers spend establishing context per task), first-pass verification rates (what percentage of check items pass on the initial verification), and iteration cycles (how many passes to reach full implementation). These metrics tell you whether the practice is working and where to invest next.


What Changes When You Do This

The quantitative improvements are significant. Teams that adopt planning-first workflows report first-attempt success rates on complex tasks improving from roughly 23% to 61%. Iteration cycles drop. PR review time decreases because reviewers have a spec to check against. Context re-establishment time drops by 60-75% with shared context libraries.

But the qualitative changes matter more.

Engineering managers can see what’s happening. Plans are visible. Architectural decisions are documented before code is written. Risk surfaces early, when it’s cheap to address. The “what are our agents building?” question has an answer.

Junior developers become more capable. A junior developer with access to the team’s shared context library, plan templates, and verification checklists can produce agent output that matches senior-level quality. The planning infrastructure encodes the team’s expertise in a form that any developer can leverage.

Knowledge persists. When a senior developer leaves, their architectural reasoning doesn’t leave with them. It’s in the context library, the plan templates, the architecture decision records. The team’s collective knowledge is an asset, not a liability tied to individual tenure.

Agent output becomes trustworthy. This is the big one. When your team has a systematic answer to “how do we know the spec was fully implemented,” agent output stops being a gamble and starts being a reliable input to your development process. That trust is what unlocks the real productivity gains that the industry has been promising but failing to deliver.

The teams that are winning with AI agents right now aren’t the ones with the best models or the most expensive tools. They’re the ones with the best planning infrastructure. The model is a commodity. The planning system around it is the competitive advantage.


Common Objections (and Why They Don’t Hold)

“This sounds like a lot of documentation work.” It is, initially. But you write each context doc once and reference it hundreds of times. The architecture patterns doc takes 2-3 hours to create. It saves that much time in the first week of use, and continues saving it every week after.

“Our code changes too fast for docs to stay current.” Use context-as-code: context docs live in the repo and get updated in the same PR that changes the code. Versioned with the codebase, never out of sync.

“AI will get better and won’t need this much context.” Better AI needs better context, not less. Even the most capable human developer needs onboarding. The more capable the agent, the more it can do with good context, and the more damage it can do without it.

“We’re too small to need this.” Small teams actually benefit most. Fewer people means less redundancy and more impact per person. A three-person team where everyone’s agent receives the same shared context is dramatically more coherent than one where each developer is flying solo.


We Built Brunel Agent for This

Everything described in this post, from planning templates to shared context libraries to systematic verification, can be done manually. Teams are doing it today with markdown files, shared repos, and discipline.

Brunel Agent is what it looks like when you build purpose-built infrastructure for this workflow.

Brunel Agent, built by Loadsys, is a planning and verification platform for teams using AI coding agents. Shared workspaces for collaborative planning. Persistent context that carries forward across sessions. Plan export in formats any coding agent can consume. And a verification engine that compares agent output against the original specification, item by item.

Plan. Export. Execute. Verify. That’s the loop. Brunel is the platform that runs it.

Download Brunel Agent and give your team the planning infrastructure your AI coding agents have been missing.


This is Part 5 and the conclusion of our series on spec-driven development and context engineering AI for development teams. If you’re just finding this series, start with Part 1: What Is Spec-Driven Development and work forward.

image description
Back to top