Insights Blog

Your Coding Agent Is Lying to You About Completion. Here’s the Proof.

by Donatas Kairys March 2, 2026

Artificial IntelligenceAgent AccuracyAI CodingAI DevelopmentCoding Agents

Your coding agent is lying to you about completion. Not maliciously. Not even technically incorrectly, in its own context window, the work does look done. But when a structured verification agent reads the actual files against a detailed specification, the story changes.

On a recent application build, every time the coding agent reported a phase complete, the verification agent found 30–40% of the work was not actually done. Not broken. Not wrong. Simply absent. And the coding agent had no idea.

This happened across nearly 1,000 verification check items. It took 5–6 verification-and-fix iterations to reach 100%. The total human time on the entire engagement, planning through final verification, was 24 hours.

Here’s what that means for teams running AI coding agents today.

The Completion Illusion

There’s a specific failure mode that nobody in the AI development tooling conversation is talking about honestly.

Coding agents are very good at generating code. They’re much less reliable at knowing when they’re done. The agent’s context window has a horizon — it knows what it built in this session, in this conversation, against the prompt it was given. It doesn’t have a persistent, structured picture of everything the specification required.

So it reports complete. Confidently, with good reason from its own perspective.

And 60–70% of the spec is implemented.

This isn’t a corner case. In this build, across multiple verification passes covering nearly 1,000 check items — data models, API integrations, UI components, payment flows, route guards, real-time subscriptions, test files — the pattern held consistently. Every “complete” declaration from the coding agent was followed by a verification pass that found roughly a third of the work still missing.

To be clear: the code that was written was good. The agent built what it said it built. The problem is everything it didn’t mention, the features specified in the plan that simply weren’t there yet.

What the Verification Layer Actually Looks Like

This build used a structured verification system with close to 1,000 check items across multiple phases of the project — organisms, pages, data hooks, API integrations, route guards, payment flows, authentication patterns, test coverage, real-time subscriptions, accessibility.

Each check item had:

A specific thing to verify (not “does auth work” but “does the ProtectedRoute wrapper appear at line X of App.tsx”)
Expected evidence (the exact component, prop, or function call that would confirm implementation)
Pass/fail status with the actual evidence found (or noted as absent)

When the coding agent declared a phase complete, the verification agent ran through the full checklist against the live codebase. It didn’t ask the coding agent what it had built. It read the files.

The results were consistent across every phase: the coding agent had implemented roughly 30–40% of what the specification required. The verification report was handed back. The coding agent fixed the gaps. Another verification pass. More gaps. This cycled 5–6 times before a full pass.

What did those gaps look like in practice?

A complete registration wizard with four steps — except Step 4 (payment: Stripe + offline selection) was missing entirely. The UI flowed smoothly to a blank screen.

Five data hooks written and exported correctly — but still calling setTimeout with mock data instead of the real AppSync GraphQL client. The app looked functional in every environment. It wasn’t connected to anything.

A waitlist feature fully specified in the planning documents — with status display, position tracking, countdown timer, claim window — not present at all. Not broken. Just absent.

Route guards protecting dashboard pages — present on most routes, missing on three. You could navigate directly to admin pages without authentication.

None of these were detectable by looking at the app. They required checking the files against the spec.

The Planning Layer: What You’re Verifying Against

For verification to work, you need something to verify against. That’s the other half of this story.

Before a single line of code was written on this build, the project went through five phases of structured AI planning: scope, requirements, architecture, data design, API design, frontend patterns, infrastructure, CI/CD, testing strategy, and roadmap. Eleven documents, cross-referenced and internally consistent.

Then a structured review pass — three parallel agents covering scope, architecture, and roadmap simultaneously — flagged 77 findings. Eleven were critical.

The wrong database technology was documented (PostgreSQL vs DynamoDB). The wrong API paradigm was specified in scope (REST vs GraphQL, contradicting the architecture document). A Step Functions workflow type was chosen that doesn’t support the callback pattern the architecture required. COPPA compliance — mandatory for a platform serving minors — was entirely absent from the specification.

These are the findings that, caught during build, cost $15,000–$40,000 each. Caught in planning, they cost an update to a document.

The eleven critical findings and twenty-two major findings were resolved before implementation began. The resulting planning suite became the specification the verification agent ran against across every subsequent phase.

That’s the loop: plans precise enough to verify against, verification rigorous enough to catch what the coding agent missed, iteration fast enough to close the gap before it becomes technical debt.

The Numbers

Let’s look at what this actually cost — and what it would have cost without it.

Total investment:

Brunel platform: ~$300
Human oversight across the full engagement: 24 hours (8–10 hours on planning, the remainder on coding agent oversight and verification review)
At $150/hour blended rate: ~$3,600 in human time
Total: ~$3,900

What the planning phase caught (conservative estimates on avoided downstream cost):

Planning Finding	Cost if Found During Build
Wrong database technology	$12K–$18K
Wrong API paradigm	$20K–$40K
Step Functions constraint violation	$8K–$15K
COPPA compliance undefined	$20K–$100K+
SLA contradictions	$5K–$15K
DR validation absent	$20K–$50K

What the verification layer caught (conservative estimates on avoided production cost):

Verification Finding	Cost if Shipped to Production
5 data hooks returning mock data	$18K–$36K emergency debugging + rework
Payment flow missing entirely	$30K–$80K incident + compliance review
Auth guard gaps	$15K–$30K security incident response
Core features absent (waitlist, registration mutations)	$20K–$40K sprint + release delay

Conservative avoided cost across planning and verification: $128K–$394K.

Return on $3,900 total investment: 33x to 100x.

The 24 Hours

This is the part that usually prompts disbelief: 24 hours of human time for a 5-phase, 11-document planning suite, a full architecture review, and nearly 1,000 check items of implementation verification across multiple sprint phases.

The human wasn’t writing the plans or running the checks. They were directing, reviewing findings, making decisions, and providing the judgment that the agents couldn’t. The agents were doing the systematic work — generating documents, running parallel review passes, reading codebases, producing verification reports, iterating on fixes.

What a senior engineer’s time bought in this engagement:

Architectural judgment on the 11 critical planning findings
Business context for the COPPA and compliance gaps
Decision-making on the 3 deferred major findings (offline mode, data import, AI algorithm spec)
Oversight of 5–6 verification iterations to confirm the gaps were actually closed

That’s 24 hours of high-leverage human judgment, not 24 hours of mechanical checking.

The Question for Every Team Running Coding Agents

When your coding agent declares a phase complete, how do you know 30–40% of the spec isn’t missing?

Most teams don’t have a systematic answer to this question. They have code review — which catches what was built badly, not what wasn’t built at all. They have QA — which catches failures in flows that were implemented, not absences of flows that should have been. They have experienced developers who intuitively notice gaps — but that scales with headcount, not with the number of agents you’re running.

The verification gap is the gap between what the coding agent thinks it built and what the specification required. Closing it needs a system, not a person reading code line by line.

That’s what the planning layer and verification layer together provide: the specification that makes verification possible, and the systematic process that makes it happen at every phase.

The constraint on AI development productivity isn’t the coding agent. It’s the loop around it.

Brunel Agent is an AI development planning platform. Plan → Export → Execute → Verify. If you’re ready to close the loop on your AI development workflow, get started now →

Topics on this page

Artificial intelligence

Authored by Donatas Kairys President/CTO, LoadSys

As President of LoadSys Consulting, I bring over 19 years of experience in architecting advanced data solutions and guiding digital transformations that empower businesses across various sectors. My focus is on creating data-driven strategies...

Insights Blog

Your Coding Agent Is Lying to You About Completion. Here’s the Proof.

The Completion Illusion

What the Verification Layer Actually Looks Like

The Planning Layer: What You’re Verifying Against

The Numbers

The 24 Hours

The Question for Every Team Running Coding Agents

Topics on this page

Recent Posts

Why AI-Generated Code Is Only 70% Done — And What That Means for Your Rebuild

Legacy Stack Engineer Hiring Cost in 2026: The Real Math

Custom Apps as Agent Opportunity: The Pattern from 3 Modernization Calls

Rebuild or Replace with SaaS? Why the Custom-App Calculation Changed in 2026

Building a Spec-Driven Development Practice for Your Engineering Team

How to Verify What Your AI Coding Agent Actually Built