V-MODEL · DECK

SDLC / SEFOR AGENTIC CODINGv1.0

The V-Model

Specify &
Verify.
In lockstep.

A 1980s software engineering discipline, reborn as the missing scaffold for LLM-driven coding agents. Every left-hand specification is the definition of a right-hand test — written before the agent writes code.

OriginPaul Rook, UK MoD · ~1986

FamilyWaterfall variant · V&V model

Core ideaEach phase pairs with a test phase

Why nowAgents skip specs & tests by default

Reading time~9 slides · 6 min

V-MODEL · DECK

02 — WHYTHE FAILURE MODE

Why · The problem this solves

Agents write code that looks right
and fails for reasons no one specified.

The default agentic loop is: prompt → code → "looks good" → ship. There is no spec to drift from and no test to fail. The bug surfaces three weeks later in production.

Symptoms in the wild

Scope drift. The agent quietly reinterprets the request mid-task.
Phantom requirements. Plausible features appear that nobody asked for.
No verification ladder. One end-to-end test, if any. No unit boundary.
Untraceable changes. Why does auth.py now import redis?
Regression amnesia. Fix in commit N reopens bug from commit N−3.

DEFAULT AGENT LOOP

"Build me a URL shortener."
→ 400 lines of code.
→ It runs. Maybe.
→ No test names a requirement.

V-MODEL LOOP

Spec → test stubs → arch → integration stubs → modules → unit tests → code → climb back up the V.

1 : 1

Every spec phase ↔ one test phase

V-MODEL · DECK

03 — THE MODELTHE SHAPE OF THE WORK

The Model

Decomposition down the left.
Recomposition up the right.

Horizontal dashed lines = each spec defines the test on its right.

V-MODEL · DECK

04 — THE INSIGHTSPEC = TEST DEFINITION

The Insight

A specification you can't write a test for isn't a specification. It's a vibe.

Why this changes agentic coding

LLM agents are excellent at producing artifacts that satisfy local constraints (this function compiles) and poor at preserving global intent (does the system do what was asked?).

The V-Model forces every level of intent to be written as a falsifiable artifact before code generation. The artifact on the left literally becomes the test fixture on the right.

Tests stop being an afterthought and become the contract that the agent's code must satisfy. The agent is no longer guessing what "done" means.

Spec ↔ Test pairings

Requirements

prd.md, user stories

↔

Acceptance Tests

cucumber, playwright e2e

System Design

arch.md, sequence diagrams

↔

System Tests

end-to-end, contract tests

Architectural Design

module map, interfaces

↔

Integration Tests

cross-module, with fakes

Module Design

function signatures, types

↔

Unit Tests

pytest, vitest, table tests

V-MODEL · DECK

05 — HOW · DOWNDECOMPOSITION PHASE

How · Phase 1 — Down the left

Each step produces a written artifact
and the test stubs it implies.

1 · Requirements → prd.md with Gherkin-style user stories. Agent also generates tests/acceptance/*.feature as empty stubs. "Given a long URL, when I POST /shorten, then I receive a 7-char slug."
2 · System Design → docs/arch.md: services, data flow, tech choices, ADRs. Agent generates tests/system/*.spec stubs that hit the public API surface.
3 · Architectural Design → module boundaries with explicit interfaces. Agent generates tests/integration/* stubs that exercise modules together with fakes at the seams.
4 · Module Design → function signatures, type definitions, error contracts — before bodies. Agent writes tests/unit/* stubs from the types.

What the agent commits, in order

# Phase 1 — Requirements
add  docs/prd.md
add  tests/acceptance/shorten.feature   # stub

# Phase 2 — System Design
add  docs/arch.md
add  docs/adr/001-storage.md
add  tests/system/api.spec.ts            # stub

# Phase 3 — Architectural Design
add  src/api/, src/store/, src/hash/      # empty
add  tests/integration/api_store.test.ts  # stub

# Phase 4 — Module Design
add  src/hash/index.ts   # signatures only
add  tests/unit/hash.test.ts             # stub

# NO IMPLEMENTATION YET.

Each commit is reviewable on its own. The full intent of the system exists as text and test stubs before a single function body is written.

V-MODEL · DECK

06 — HOW · UPVERIFICATION PHASE

How · Phase 2 — Up the right

Implement at the bottom of the V.
Then climb, satisfying tests at each rung.

5 · Implementation. Now — and only now — the agent fills function bodies. The unit test stubs from Phase 4 become red, then green, module by module. The agent's job is bounded by the signatures it agreed to.
6 · Unit testing. Agent completes the unit-test bodies it stubbed earlier. Run them in CI on every commit.
7 · Integration testing. Modules wired together; fakes at external boundaries. Failures here mean the architecture was wrong, not the code.
8 · System testing. Real environment, real database, real HTTP. Verifies the system design from Phase 2.
9 · Acceptance. The Gherkin scenarios from Phase 1 run green. Done means the requirement is satisfied, not "the agent finished talking."

Failure routes — where each red test sends you

Test fails	You return to
Unit	Module Design (Phase 4)
Integration	Architectural Design (Phase 3)
System	System Design (Phase 2)
Acceptance	Requirements (Phase 1)

This is the V-Model's quiet superpower: a failure tells you which design document was wrong, not just which line of code. For agents, that's a far more useful error signal than a stack trace.

↗ red→green

climb the right side, one rung at a time

V-MODEL · DECK

07 — EXAMPLEURL SHORTENER, 9 STEPS

Example · Build a URL shortener with an agent

A two-day side project, V-Model'd.

Each step is one agent turn with one reviewable artifact. No code is written before step 5.

01 · REQ

Capture intent

"Authenticated users shorten URLs. Slugs are 7 chars. /s/<slug> 302-redirects."

→ prd.md
→ 4 .feature files

02 · SYS DESIGN

Pick the shape

Single Node service, Postgres, base62 hash of incrementing ID. ADR for why not nanoid.

→ arch.md
→ adr/001.md

03 · ARCH

Draw module seams

api/ · store/ · hash/ · auth/. Interfaces typed. Store is a port; Postgres is one adapter.

→ src/ skeleton
→ 3 integration stubs

04 · MODULE

Sign every function

encode(id:number):string · decode(slug:string):number|Err. Errors are types, not strings.

→ .d.ts files
→ 11 unit stubs

05 · CODE

Fill the bodies

Bottom of the V. Agent writes implementations bounded by signatures & unit tests.

→ src/**/*.ts
→ unit tests green

06 · UNIT

Verify modules

11 unit tests run on every save. Mutation testing run nightly. Coverage is a side-effect, not a goal.

→ vitest green
→ stryker score 78

07 · INTEG

Verify seams

API↔Store↔Hash wired together with an in-memory store. Catches the bug where slugs collide on race.

→ 3 tests green
→ 1 bug found

08 · SYSTEM

Verify system

Real Postgres in Docker. Real HTTP. JWT auth flow. Rate-limit headers correct.

→ playwright
→ 4 specs green

09 · ACCEPT

Verify intent

The 4 .feature files from step 01 run end-to-end. If they're green, the requirement is met. Ship.

→ cucumber green
→ deploy

RESULT

What you have

Working software + 4 design docs + 18 tests, each traceable to a specific line of intent.

audit trail = free

V-MODEL · DECK

08 — IMPLEMENTATIONTHE AGENT LOOP

Implementation · Wiring this into an agent

One phase per session.
One artifact per commit.

The mechanics

Phase gating. The agent's system prompt locks it to the current phase. It cannot write code in a "Requirements" session.
Artifact-first. Every session ends by writing or updating a file in docs/ or tests/. No artifact = no progress.
Test stubs travel left. Each spec session emits the empty test file for its paired right-side phase.
Traceability tags. Every test references the spec line it verifies: // @spec prd.md#FR-2.1.
Re-entry on red. A failing test in CI auto-routes the agent back to the design doc the failure maps to (see slide 06).

Tooling sketch

Claude Code / Aider / Cursor in agent mode, scoped per phase.
One repo, four dirs: docs/, tests/{acceptance,system,integration,unit}/, src/, adr/.
CI runs the four test rings on every PR. Red ring = which phase failed.

Phase prompt skeleton (system message)

# /prompts/phase-04-module-design.md

ROLE
You are operating in PHASE 4 of the V-Model:
Module Design. You may not write function bodies.

INPUTS
- docs/prd.md
- docs/arch.md
- src/**/*.d.ts (existing signatures)

DELIVERABLES
1. Add or refine TypeScript signatures in src/
2. Define error types as discriminated unions
3. For each new signature, emit a unit-test stub
   in tests/unit/ tagged // @spec <file>#<anchor>

CONSTRAINTS
- No function bodies. throw new Error("todo") only.
- Every exported symbol must be tested.
- Every test stub must reference a spec anchor.

DONE WHEN
- pnpm typecheck passes
- pnpm test --reporter=list shows N todo tests
- git diff has no implementation lines

# end

V-MODEL · DECK

09 — REFERENCEPITFALLS & FURTHER READING

Reference · What goes wrong & where to dig further

The V-Model is rigid by design.
Bend it intentionally, never silently.

Pitfalls & anti-patterns

01

Big-bang specs

Writing all four design docs before any code. The V is a per-feature loop, not a project phase. Slice vertically — one user story end-to-end through the V at a time.

02

Spec rot

Code drifts; docs don't follow. Mitigation: spec anchors are referenced from tests; broken anchor = CI failure.

03

Theater testing

The agent writes tests that pass trivially. Defend with mutation testing (Stryker, mutmut) and the rule: every assertion must reference a spec anchor.

04

Phase smuggling

Agent slips implementation into a design session. Defend with phase-locked prompts (slide 08) and a pre-commit hook that rejects src/**/*.ts bodies in design-phase commits.

05

V for everything

The V suits well-scoped features and safety-critical work. For exploratory prototyping, run a stripped two-rung V (Requirements ↔ Acceptance only) and graduate to the full V when the idea survives.

Glossary

V&V	Verification & Validation. "Built it right" vs. "built the right thing."
ADR	Architecture Decision Record. One file per non-obvious choice, with context + consequence.
Gherkin	Given/When/Then DSL — turns a user story into an executable spec.
Port / Adapter	Interface that lets you swap an implementation (real DB ↔ in-memory fake).
Mutation testing	Mutates your code; tests should fail. Catches assertions that don't assert.
Spec anchor	Stable id (heading or comment) that a test cites — e.g. `prd.md#FR-2.1`.

To dig deeper

Rook, P. "Controlling software projects" — original V-Model framing, IEE Software Eng. Journal, 1986.
ISO/IEC/IEEE 29119 — software testing standards; maps cleanly to the right side of the V.
Cohn, M. Agile Estimating and Planning — for the vertical-slicing critique of waterfall V.
Fowler, M. "Test Pyramid" and "TestDouble" — refines the right-side test taxonomy.
Beck, K. Test-Driven Development: By Example — TDD is the V-Model compressed to one rung.
North, D. "Introducing BDD" — connects Gherkin acceptance specs to the top of the V.
Anthropic. "Claude Code: best practices" — phase-scoped agent sessions in practice.

Specify &Verify.In lockstep.

Agents write code that looks rightand fails for reasons no one specified.

Decomposition down the left.Recomposition up the right.

Why this changes agentic coding

Each step produces a written artifactand the test stubs it implies.

Implement at the bottom of the V.Then climb, satisfying tests at each rung.

A two-day side project, V-Model'd.

Capture intent

Pick the shape

Draw module seams

Sign every function

Fill the bodies

Verify modules

Verify seams

Verify system

Verify intent

What you have

One phase per session.One artifact per commit.

The mechanics

Tooling sketch

The V-Model is rigid by design.Bend it intentionally, never silently.

Pitfalls & anti-patterns

Big-bang specs

Spec rot

Theater testing

Phase smuggling

V for everything

Glossary

To dig deeper

Specify &
Verify.
In lockstep.

Agents write code that looks right
and fails for reasons no one specified.

Decomposition down the left.
Recomposition up the right.

Each step produces a written artifact
and the test stubs it implies.

Implement at the bottom of the V.
Then climb, satisfying tests at each rung.

One phase per session.
One artifact per commit.

The V-Model is rigid by design.
Bend it intentionally, never silently.