Juniors Can Prompt. They Can’t Yet Judge. Closing That Gap Is the Senior Engineer’s Job.
TL;DR. Early-career developers are adopting AI coding tools faster than anyone else, and trusting the output more than anyone else, at exactly the moment the tools’ defining failure is code that is almost right, but not quite. That is the junior developer AI skills gap: the developers leaning hardest on AI are the least equipped to catch where it’s wrong. Catching “almost right” requires judgment, and judgment is the one thing AI cannot hand a junior developer. The conference-circuit answer (make juniors hand-write code so they suffer the way we did) gets the diagnosis right and the prescription wrong. The turmoil was never the teacher; the feedback loop was, and AI removes the loop unless you put it back. The senior engineer’s real job in 2026 is not to write more code. It’s to rebuild the feedback loops that build judgment, inside the daily work, by making juniors write the spec before the agent writes the code, predict failures before they run, and review the agent’s output as a graded skill, not a rubber stamp.
Earlier this week I was at GOTO’s Accelerate Chicago 2026, up in Willis Tower, and the same worry kept surfacing across the talks. It’s a real one. Scott Hanselman gave a session on junior developers, AI agents, and the missing context layer. Nathen Harvey, who leads DORA at Google Cloud, keynoted on AI-assisted software development. Different talks, different angles, but a common undercurrent ran through the room: the developers entering the field right now are learning to drive AI before they’ve built the fundamentals that tell them whether the AI is right.
The senior engineers who came up between the late ’90s and now learned their craft the slow way. They wrote code from scratch, shipped it, watched it break, and figured out why. The developers entering the field in 2026 are skipping most of that. They open an agent, describe what they want, and get working code back in seconds. The worry, stated plainly, is that without the turmoil the next generation never builds the base knowledge they need to orchestrate AI well or tell a good pattern from a bad one.
The diagnosis is correct. The prescription that usually follows it, make them write it by hand so they earn it the hard way, is not. It mistakes the pain for the lesson.
Here’s the version I’d defend instead.
The junior developer AI skills gap shows up in the data
Start with the data, because the shape of the problem is not what the nostalgia version assumes.
The 2025 Stack Overflow Developer Survey, covering 49,000+ developers and the closest thing the field has to a census, found AI use has become a baseline rather than a novelty: 84% of developers use or plan to use these tools. But the adoption curve runs backward against experience. Early-career developers lead daily use at 55.5%, ahead of seasoned professionals.
Now hold that next to the survey’s single biggest frustration. Two-thirds of developers (66%) say their top problem with AI tools is dealing with code that is “almost right, but not quite,” and 45% report that debugging AI-generated code eats more time than it saves.
“Almost right, but not quite” is the most dangerous output a tool can produce, because it compiles. It looks done. It passes the quick glance. The only thing standing between “almost right” and a production incident is a person who has seen this failure mode before and knows where to look.
That person is, almost by definition, not the junior.
The survey makes the experience gap explicit: trust in AI accuracy falls as experience rises. The most senior developers are the most skeptical: the lowest “highly trust” rate, the highest “highly distrust” rate in the entire sample. They’ve earned that skepticism one bad merge at a time. The people leaning hardest on the tool are the ones with the least of the judgment required to use it safely, and the failure mode they’re most exposed to is precisely the one that judgment exists to catch.
That’s the actual problem. Not that juniors aren’t typing enough code. That they’re shipping code they can’t yet evaluate.
The turmoil was never the teacher
The conference framing treats hand-writing code as the thing that made senior engineers good. It wasn’t.
What made us good was the loop: make a decision, watch it meet reality, understand why it failed, adjust. Writing the code by hand was just the delivery mechanism for that loop. You wrote the naive O(n²) version, watched it fall over at 100K rows, and learned something about complexity that no textbook had made stick. The lesson lived in the consequence, not in the keystrokes.
AI doesn’t remove the need for that loop. It removes the loop itself, because the agent now both writes the code and implicitly declares it done. The junior never makes the call, never watches it fail, never traces the failure back to the decision. The feedback that used to come for free with hand-coding has to be deliberately put back, or it simply doesn’t happen.
Addy Osmani, who runs developer experience for Chrome at Google, named the dynamic well. He calls it the “70% problem”: AI gets you about 70% of the way to a working solution fast, but the last 30% (the edge cases, the production hardening, the architecture, the informal business logic) is the hard mile that still needs a human. He pairs it with what he calls the knowledge paradox: senior developers use AI to accelerate what they already know how to do, while juniors try to use it to learn what to do. The result, in his words, is “house of cards code”: it looks complete and collapses under real-world pressure.
This maps cleanly onto something we’ve written about for years: the completion illusion. An AI coding agent reports a task finished when 30–40% of the real work is actually built. A senior engineer feels the gap instinctively and goes looking for the missing 60%. A junior who never built systems by hand has no instinct to distrust the “done.” Teaching that distrust, calibrated and specific and evidence-based, is the core of the job now.
What’s actually eroding (and what isn’t)
It would be easy to wave this away as the same hand-wringing that greeted the calculator and the IDE. The evidence says it isn’t quite the same.
A 2025 study by Michael Gerlich in the journal Societies, covering 666 participants, found a measurable negative relationship between frequent AI-tool use and critical-thinking scores, with cognitive offloading acting as the mediating factor, and the effect was strongest among the youngest users. Offload the thinking to the machine often enough and the muscle that does the thinking gets weaker. That’s not nostalgia. That’s a finding.
But here’s the part that matters most, and it’s the part the “make them suffer” crowd skips. The same body of research consistently finds that scaffolded AI use does not produce the same decline. When the tool is used inside a structure that forces the human to engage (to plan, to predict, to verify), critical thinking holds up, and in some studies improves. The damage isn’t caused by AI. It’s caused by unstructured AI. The form of the integration is the whole game.
So the abstinence prescription is wrong in both directions. Forcing juniors to hand-code CRUD endpoints to “earn it” trains a skill the market no longer pays for. And letting them point an agent at a ticket and merge whatever comes back, vibe coding in other words, is exactly the unstructured use the research warns about.
The answer is the third option: structure. Which, conveniently, is the thing we already do.
Mentoring is not a separate activity from the work
The mistake most teams make is treating mentorship as something you bolt onto delivery: a lunch-and-learn, a monthly 1:1, a Confluence page nobody reads. It doesn’t survive contact with a sprint deadline.
The version that works is built into how the work gets done. Our whole methodology (Spec-Driven Development, a structured Plan → Execute → Verify workflow) happens to be one of the best teaching scaffolds I’ve used, because it externalizes the exact judgments a senior makes silently. The numbers we cite for it are delivery numbers, but they’re also pedagogy: 82% of agent failures trace to poor pre-execution planning, structured plans produce a 3.2× higher first-attempt success rate, and accuracy on complex coding tasks moves from 23% to 61% when the plan exists before the prompt. Every one of those gaps is a place where judgment lives. Make the junior stand in that place.
Here’s what that looks like in practice. None of it requires a program. All of it fits inside work you’re already shipping.
Make them write the spec before the agent writes the code. The single highest-leverage move. Before anyone prompts an agent, the junior writes down what “done” means: scope, edge cases, the interfaces it touches, what could go wrong. This forces the thinking before the seductive 70% solution appears and short-circuits it. A spec the junior wrote is also a thing you can review and correct, which a prompt history is not.
Have them predict the failure before they run it. Once the agent returns code, the junior’s job isn’t to run it. It’s to write down, first, where they think it will break. Then run it. The gap between their prediction and reality is the lesson, and it’s the completion-illusion instinct being built in real time. Do this fifty times and they stop trusting “done.”
Review their review, not just their code. The skill that matters now is evaluating output, so grade the evaluation. Hand a junior a piece of AI-generated code and ask them to critique it. Then review the critique. What did they miss? What did they flag that didn’t matter? This trains the exact muscle the Stack Overflow data says senior engineers have and juniors don’t.
Make them read code they didn’t write, including the old stuff. Assign a junior a gnarly module from a legacy system and ask them to explain why it does what it does, not just what it does. Reading is how you learn patterns at scale, and AI has quietly made reading optional. The institutional knowledge buried in an eight-year-old codebase is a better teacher than any greenfield ticket. (It’s also, as we’ve written, the knowledge that walks out the door when the senior who holds it leaves.)
Narrate your own reasoning out loud. When you review an agent’s PR, say what you’re thinking as you do it. “I don’t trust this error handling. Show me what happens on a timeout.” “This works, but it’ll be unmaintainable in six months, here’s why.” The feedback loop that used to be invisible inside your own head is the most valuable thing you own. Externalize it.
Let them ship something that breaks, on purpose and safely. Consequence is the teacher. Give a junior ownership of a real but low-blast-radius service, and let a bad decision actually reach staging and fall over. A controlled failure they had to debug teaches more than ten code reviews where you caught it for them. The goal isn’t to protect them from every mistake. It’s to make the mistakes survivable and the lessons real.
For the senior who’s on the way out
The conference talks were aimed at outgoing and soon-to-be-outgoing seniors, and that framing deserves a direct answer, because it changes what “transfer” means.
When a senior engineer leaves, the org scrambles to capture the system knowledge: the runbook, the architecture diagram, the why-is-this-flag-here tribal lore. That matters, and a written spec is the right vessel for it. But the system knowledge is the replaceable part. The thing that’s genuinely scarce, the thing the talent market can’t backfill in 90 days, is the judgment: the calibrated instinct for when “done” isn’t done, for which 30% the agent skipped, for what’s going to break under load.
That judgment doesn’t transfer through documentation. It transfers through reps, under supervision, while you’re still in the room. The most valuable thing a departing senior can leave behind isn’t the runbook. It’s two or three juniors who learned to distrust the machine the way you do.
If you’re going to spend your last quarter on something, spend it on that.
FAQ
Isn’t this just the calculator argument? Every generation panics about the new tool. Partly, and the historical skeptic is usually wrong. The difference the 2025 research surfaces is that unstructured AI use measurably erodes critical thinking via cognitive offloading, while scaffolded use doesn’t. The calculator didn’t come with a “trust me, it’s done” button that hides 30% of the work. The fix isn’t to ban the tool; it’s to put structure around how juniors use it.
Should we still make junior developers learn to code without AI? They need to be able to read and reason about code far more than they need to type it from a blank file. The base knowledge is real: data structures, complexity, how a relational schema actually behaves under load, what a race condition feels like. But you build that through prediction, review, and debugging real failures, not through reciting boilerplate by hand. Teach judgment directly instead of hoping it emerges from typing.
How does this connect to spec-driven development? Directly. A spec forces the junior to think through intent and edge cases before the agent produces a tempting-but-incomplete answer, and it gives the senior a concrete artifact to review and correct. The Plan → Execute → Verify loop is a delivery discipline that doubles as a teaching scaffold; it puts the junior in exactly the spots where senior judgment normally operates silently.
The job changed; the responsibility didn’t
The senior engineer’s role used to be to write the hard parts. Increasingly it’s to define what “right” looks like and to verify what the agent built against it, and to make sure the next generation can do the same after you’ve gone. The juniors are going to use AI no matter what the conference panel recommends; the only open question is whether they learn to judge its output or just merge it.
That’s a training problem, not a tooling problem. And it’s solvable, but only on purpose.
If your team is trying to figure out how to get real engineering judgment into developers who came up on AI, that’s a conversation we have often. We train engineering teams on exactly this: the spec-driven workflow that turns AI from a junior-developer crutch into a junior-developer accelerant.
Talk to an AI Engineer about team training →
Lee Forkenbrock is CEO of LoadSys, a Chicago-based AI engineering consultancy with 20+ years of shipping software and 400+ client problems solved.
Related reading: What Is Spec-Driven Development (and Why It’s Replacing Vibe Coding) · Legacy Stack Engineer Hiring Cost in 2026: The Real Math