V-MODEL · DECK
SDLC / SEFOR AGENTIC CODINGv1.0
The V-Model

Specify &
Verify.
In lockstep.

A 1980s software engineering discipline, reborn as the missing scaffold for LLM-driven coding agents. Every left-hand specification is the definition of a right-hand test — written before the agent writes code.

OriginPaul Rook, UK MoD · ~1986
FamilyWaterfall variant · V&V model
Core ideaEach phase pairs with a test phase
Why nowAgents skip specs & tests by default
Reading time~9 slides · 6 min
V-MODEL · DECK
02 — WHYTHE FAILURE MODE
Why · The problem this solves

Agents write code that looks right
and fails for reasons no one specified.

The default agentic loop is: prompt → code → "looks good" → ship. There is no spec to drift from and no test to fail. The bug surfaces three weeks later in production.

Symptoms in the wild
  • Scope drift. The agent quietly reinterprets the request mid-task.
  • Phantom requirements. Plausible features appear that nobody asked for.
  • No verification ladder. One end-to-end test, if any. No unit boundary.
  • Untraceable changes. Why does auth.py now import redis?
  • Regression amnesia. Fix in commit N reopens bug from commit N−3.
DEFAULT AGENT LOOP

"Build me a URL shortener."
→ 400 lines of code.
→ It runs. Maybe.
→ No test names a requirement.

V-MODEL LOOP

Spec → test stubs → arch → integration stubs → modules → unit tests → code → climb back up the V.

1 : 1
Every spec phase ↔ one test phase
V-MODEL · DECK
03 — THE MODELTHE SHAPE OF THE WORK
The Model

Decomposition down the left.
Recomposition up the right.

Horizontal dashed lines = each spec defines the test on its right.

Requirements PRD · user stories System Design arch.md · ADRs Architectural Design modules · interfaces Module Design signatures · contracts Implementation code is written here · once Unit Testing verifies module design Integration Testing verifies architecture System Testing verifies system design Acceptance Testing verifies requirements DECOMPOSITION ↓ RECOMPOSITION ↑ VERIFIES
V-MODEL · DECK
04 — THE INSIGHTSPEC = TEST DEFINITION
The Insight

A specification you can't write a test for isn't a specification. It's a vibe.

Why this changes agentic coding

LLM agents are excellent at producing artifacts that satisfy local constraints (this function compiles) and poor at preserving global intent (does the system do what was asked?).

The V-Model forces every level of intent to be written as a falsifiable artifact before code generation. The artifact on the left literally becomes the test fixture on the right.

Tests stop being an afterthought and become the contract that the agent's code must satisfy. The agent is no longer guessing what "done" means.

Spec ↔ Test pairings
Requirements
prd.md, user stories
Acceptance Tests
cucumber, playwright e2e
System Design
arch.md, sequence diagrams
System Tests
end-to-end, contract tests
Architectural Design
module map, interfaces
Integration Tests
cross-module, with fakes
Module Design
function signatures, types
Unit Tests
pytest, vitest, table tests
V-MODEL · DECK
05 — HOW · DOWNDECOMPOSITION PHASE
How · Phase 1 — Down the left

Each step produces a written artifact
and the test stubs it implies.

  • 1 · Requirements → prd.md with Gherkin-style user stories. Agent also generates tests/acceptance/*.feature as empty stubs. "Given a long URL, when I POST /shorten, then I receive a 7-char slug."
  • 2 · System Design → docs/arch.md: services, data flow, tech choices, ADRs. Agent generates tests/system/*.spec stubs that hit the public API surface.
  • 3 · Architectural Design → module boundaries with explicit interfaces. Agent generates tests/integration/* stubs that exercise modules together with fakes at the seams.
  • 4 · Module Design → function signatures, type definitions, error contracts — before bodies. Agent writes tests/unit/* stubs from the types.
What the agent commits, in order
# Phase 1 — Requirements
add  docs/prd.md
add  tests/acceptance/shorten.feature   # stub

# Phase 2 — System Design
add  docs/arch.md
add  docs/adr/001-storage.md
add  tests/system/api.spec.ts            # stub

# Phase 3 — Architectural Design
add  src/api/, src/store/, src/hash/      # empty
add  tests/integration/api_store.test.ts  # stub

# Phase 4 — Module Design
add  src/hash/index.ts   # signatures only
add  tests/unit/hash.test.ts             # stub

# NO IMPLEMENTATION YET.

Each commit is reviewable on its own. The full intent of the system exists as text and test stubs before a single function body is written.

V-MODEL · DECK
06 — HOW · UPVERIFICATION PHASE
How · Phase 2 — Up the right

Implement at the bottom of the V.
Then climb, satisfying tests at each rung.

  • 5 · Implementation. Now — and only now — the agent fills function bodies. The unit test stubs from Phase 4 become red, then green, module by module. The agent's job is bounded by the signatures it agreed to.
  • 6 · Unit testing. Agent completes the unit-test bodies it stubbed earlier. Run them in CI on every commit.
  • 7 · Integration testing. Modules wired together; fakes at external boundaries. Failures here mean the architecture was wrong, not the code.
  • 8 · System testing. Real environment, real database, real HTTP. Verifies the system design from Phase 2.
  • 9 · Acceptance. The Gherkin scenarios from Phase 1 run green. Done means the requirement is satisfied, not "the agent finished talking."
Failure routes — where each red test sends you
Test failsYou return to
UnitModule Design (Phase 4)
IntegrationArchitectural Design (Phase 3)
SystemSystem Design (Phase 2)
AcceptanceRequirements (Phase 1)

This is the V-Model's quiet superpower: a failure tells you which design document was wrong, not just which line of code. For agents, that's a far more useful error signal than a stack trace.

red→green
climb the right side, one rung at a time
V-MODEL · DECK
07 — EXAMPLEURL SHORTENER, 9 STEPS
Example · Build a URL shortener with an agent

A two-day side project, V-Model'd.

Each step is one agent turn with one reviewable artifact. No code is written before step 5.

01 · REQ

Capture intent

"Authenticated users shorten URLs. Slugs are 7 chars. /s/<slug> 302-redirects."

→ prd.md
→ 4 .feature files
02 · SYS DESIGN

Pick the shape

Single Node service, Postgres, base62 hash of incrementing ID. ADR for why not nanoid.

→ arch.md
→ adr/001.md
03 · ARCH

Draw module seams

api/ · store/ · hash/ · auth/. Interfaces typed. Store is a port; Postgres is one adapter.

→ src/ skeleton
→ 3 integration stubs
04 · MODULE

Sign every function

encode(id:number):string · decode(slug:string):number|Err. Errors are types, not strings.

→ .d.ts files
→ 11 unit stubs
05 · CODE

Fill the bodies

Bottom of the V. Agent writes implementations bounded by signatures & unit tests.

→ src/**/*.ts
→ unit tests green
06 · UNIT

Verify modules

11 unit tests run on every save. Mutation testing run nightly. Coverage is a side-effect, not a goal.

→ vitest green
→ stryker score 78
07 · INTEG

Verify seams

API↔Store↔Hash wired together with an in-memory store. Catches the bug where slugs collide on race.

→ 3 tests green
→ 1 bug found
08 · SYSTEM

Verify system

Real Postgres in Docker. Real HTTP. JWT auth flow. Rate-limit headers correct.

→ playwright
→ 4 specs green
09 · ACCEPT

Verify intent

The 4 .feature files from step 01 run end-to-end. If they're green, the requirement is met. Ship.

→ cucumber green
→ deploy
RESULT

What you have

Working software + 4 design docs + 18 tests, each traceable to a specific line of intent.

audit trail = free
V-MODEL · DECK
08 — IMPLEMENTATIONTHE AGENT LOOP
Implementation · Wiring this into an agent

One phase per session.
One artifact per commit.

The mechanics

  • Phase gating. The agent's system prompt locks it to the current phase. It cannot write code in a "Requirements" session.
  • Artifact-first. Every session ends by writing or updating a file in docs/ or tests/. No artifact = no progress.
  • Test stubs travel left. Each spec session emits the empty test file for its paired right-side phase.
  • Traceability tags. Every test references the spec line it verifies: // @spec prd.md#FR-2.1.
  • Re-entry on red. A failing test in CI auto-routes the agent back to the design doc the failure maps to (see slide 06).

Tooling sketch

  • Claude Code / Aider / Cursor in agent mode, scoped per phase.
  • One repo, four dirs: docs/, tests/{acceptance,system,integration,unit}/, src/, adr/.
  • CI runs the four test rings on every PR. Red ring = which phase failed.
Phase prompt skeleton (system message)
# /prompts/phase-04-module-design.md

ROLE
You are operating in PHASE 4 of the V-Model:
Module Design. You may not write function bodies.

INPUTS
- docs/prd.md
- docs/arch.md
- src/**/*.d.ts (existing signatures)

DELIVERABLES
1. Add or refine TypeScript signatures in src/
2. Define error types as discriminated unions
3. For each new signature, emit a unit-test stub
   in tests/unit/ tagged // @spec <file>#<anchor>

CONSTRAINTS
- No function bodies. throw new Error("todo") only.
- Every exported symbol must be tested.
- Every test stub must reference a spec anchor.

DONE WHEN
- pnpm typecheck passes
- pnpm test --reporter=list shows N todo tests
- git diff has no implementation lines

# end
V-MODEL · DECK
09 — REFERENCEPITFALLS & FURTHER READING
Reference · What goes wrong & where to dig further

The V-Model is rigid by design.
Bend it intentionally, never silently.

Pitfalls & anti-patterns

01

Big-bang specs

Writing all four design docs before any code. The V is a per-feature loop, not a project phase. Slice vertically — one user story end-to-end through the V at a time.

02

Spec rot

Code drifts; docs don't follow. Mitigation: spec anchors are referenced from tests; broken anchor = CI failure.

03

Theater testing

The agent writes tests that pass trivially. Defend with mutation testing (Stryker, mutmut) and the rule: every assertion must reference a spec anchor.

04

Phase smuggling

Agent slips implementation into a design session. Defend with phase-locked prompts (slide 08) and a pre-commit hook that rejects src/**/*.ts bodies in design-phase commits.

05

V for everything

The V suits well-scoped features and safety-critical work. For exploratory prototyping, run a stripped two-rung V (Requirements ↔ Acceptance only) and graduate to the full V when the idea survives.

Glossary

V&VVerification & Validation. "Built it right" vs. "built the right thing."
ADRArchitecture Decision Record. One file per non-obvious choice, with context + consequence.
GherkinGiven/When/Then DSL — turns a user story into an executable spec.
Port / AdapterInterface that lets you swap an implementation (real DB ↔ in-memory fake).
Mutation testingMutates your code; tests should fail. Catches assertions that don't assert.
Spec anchorStable id (heading or comment) that a test cites — e.g. prd.md#FR-2.1.

To dig deeper

  • Rook, P. "Controlling software projects" — original V-Model framing, IEE Software Eng. Journal, 1986.
  • ISO/IEC/IEEE 29119 — software testing standards; maps cleanly to the right side of the V.
  • Cohn, M. Agile Estimating and Planning — for the vertical-slicing critique of waterfall V.
  • Fowler, M. "Test Pyramid" and "TestDouble" — refines the right-side test taxonomy.
  • Beck, K. Test-Driven Development: By Example — TDD is the V-Model compressed to one rung.
  • North, D. "Introducing BDD" — connects Gherkin acceptance specs to the top of the V.
  • Anthropic. "Claude Code: best practices" — phase-scoped agent sessions in practice.