Contract-Driven Development: Defining 'Done' Before Writing Code

Q: What is contract-driven development?

Writing explicit success criteria (clauses) before coding starts. Each clause is specific (no ambiguity), testable (can verify true/false), and binary (either SAT/satisfied or UNSAT/unsatisfied). Work stops when all clauses are SAT. Not 'mostly done' - done.

Q: How does contract-driven development prevent scope creep?

The contract acts as a filter. For each feature request: 'Does this support any of the clauses?' If no, reject - not in contract. For optimizations: 'Does the clause require this?' If already SAT, defer - not needed for contract satisfaction. Features not supporting clauses are out of scope.

Q: What's wrong with traditional requirements?

Four problems: (1) Ambiguity - 'should be fast' without defining how fast. (2) Scope creep - every 'should also add' expands scope. (3) 'Almost done' syndrome - 90% complete for months, last 10% reveals edge cases. (4) Unclear ownership - who decides if a feature is complete? Without objective criteria, completion is subjective.

Q: When does contract-driven development work and when doesn't it?

Works for: compliance systems (regulations define success), high-reliability systems (failures have real costs), fixed-scope projects, regulated industries. Doesn't work for: exploratory projects (requirements emerge through iteration), creative tools (success is subjective), rapidly changing markets, systems with vague 'make it better' goals.

Q: How do you write good contract clauses?

Bad: 'The system should be user-friendly' (not testable). Good: 'Users can complete task with zero training in <10 minutes.' Bad: 'Performance should be acceptable' (vague). Good: 'Form generation completes in <30 seconds for 50 incidents.' Bad: 'Code should be maintainable' (subjective). Good: 'External reviewer can understand core logic without walkthrough in <2 hours.'

2026-01-23

Quick Answer

Contract-driven development means writing explicit success criteria (clauses) before coding starts. Each clause is binary: satisfied (SAT) or unsatisfied (UNSAT). Work stops when all clauses are SAT. Trade-off: Requires upfront clarity and limits flexibility for exploratory projects. Benefit: Prevents scope creep, enables deterministic verification, eliminates "almost done" syndrome, and makes regulated systems auditable. For compliance tools and high-reliability systems, it's the disciplined approach that actually ships.

Most software projects have no clear definition of done.

"Build a recordkeeping tool" sounds specific. It's not. Does it need to handle duplicates? What about partial data? Should it validate inputs? How fast should it run? What happens when requirements are ambiguous?

Without answers, the project drifts. Features get added because they "might be useful." Scope expands. Deadlines slip. "Almost done" becomes permanent status.

Contract-driven development solves this by defining success before writing code.

Not vague requirements. Not user stories. Explicit clauses that are either satisfied or unsatisfied. Binary verification. SAT or UNSAT.

When all clauses are SAT, you're done. Not "mostly done." Not "ready for review." Done.

This approach shipped the OSHA compliance tool in 3 months. Zero scope creep. Zero "almost done" features. Every clause verified before release.

It works because compliance systems need deterministic outcomes, not iterative refinement.

The Problem with Traditional Requirements

Problem 1: Ambiguity

"The system should be fast."

How fast? Milliseconds? Seconds? Compared to what baseline?

Without precision, you can't verify success.

Problem 2: Scope creep

"While we're building this, we should also add..."

Every "should" expands scope. Projects balloon. Deadlines become fiction.

Problem 3: "Almost done" syndrome

"90% complete" can last for months. The last 10% reveals edge cases, missing validation, performance problems.

Without binary verification, you can't distinguish "close to done" from "actually done."

Problem 4: Unclear ownership

Who decides if a feature is complete? Product manager? Developer? User? Different stakeholders have different standards.

Without objective criteria, completion is subjective.

Contract-Driven Development: The Core Idea

Write explicit clauses defining success before coding.

Each clause is:

Specific: No ambiguity about what it means
Testable: You can verify it's true or false
Binary: Either SAT (satisfied) or UNSAT (unsatisfied)

Example clause:

``` Clause 3: Recordability Classification

Input: Text description of workplace incident
Output: Classification as recordable/non-recordable per 29 CFR 1904
Verification: 100% agreement with OSHA official guidance on test cases
Status: [SAT/UNSAT]

```

You can test this. You can verify it. You know when you're done.

The OSHA Tool Contract: 13 Clauses

Before writing code, we defined success:

Clause 1: Input Flexibility

Accept incident descriptions in any text format (narrative, bullet points, structured data)
No required fields; handle missing information gracefully
Verification: Parse 50 diverse real-world examples without errors
Status: SAT

Clause 2: Recordability Classification

Classify incidents as recordable/non-recordable per 29 CFR 1904
Handle edge cases (pre-existing conditions, medical treatment, privacy cases)
Verification: 100% agreement with OSHA guidance on 100 test cases
Status: SAT

Clause 3: Form Generation

Output: OSHA Form 300A (PDF) + CSV for ITA submission
Match official format exactly (layout, fields, formatting)
Verification: Side-by-side comparison with official blank form
Status: SAT

Clause 4: Performance

Generate Form 300A in < 30 seconds for up to 50 incidents
No external API calls causing delays
Verification: Benchmark test with 50-incident dataset
Status: SAT

Clause 5: Accuracy

Zero tolerance for compliance errors on core logic
Conservative bias when rules are ambiguous
Verification: Manual review by compliance expert on 20 edge cases
Status: SAT

Clause 6: Auditability

Explain reasoning for every recordability decision
Cite specific CFR sections supporting each classification
Verification: Every output includes "Reasoning" section with citations
Status: SAT

Clause 7: Data Security

No sensitive incident data stored or transmitted
All processing happens locally
Verification: Code review confirms no database writes, no API calls with PII
Status: SAT

Clause 8: Repeatability

Same input always produces same output
Deterministic classification logic (no randomness, no timestamps affecting output)
Verification: Run same test case 10 times, verify identical output
Status: SAT

Clause 9: Validation

Validate all outputs before release to user
Catch malformed dates, missing required fields, invalid classifications
Verification: Introduce deliberate errors, confirm detection
Status: SAT

Clause 10: Error Handling

Graceful handling of incomplete data (missing dates, unclear injury descriptions)
Provide actionable error messages (not "Invalid input")
Verification: Test with intentionally incomplete datasets
Status: SAT

Clause 11: Testing

100% test coverage on core classification and form generation logic
Edge cases explicitly tested (not just happy path)
Verification: Coverage report shows 100% on critical modules
Status: SAT

Clause 12: Maintainability

Code is readable without extensive comments
Assumptions documented where non-obvious
Verification: External reviewer can understand logic without walkthrough
Status: SAT

Clause 13: Scope Limitation

Only OSHA recordkeeping (29 CFR 1904)
Only Form 300A generation (not 300, 301, or other forms)
Not a general compliance framework
Verification: Feature requests outside scope are rejected
Status: SAT

These 13 clauses were the contract. When all were SAT, the project was done.

How Contracts Prevent Scope Creep

Scenario 1: Feature request arrives

Request: "Can we also generate Form 301 (detailed incident report)?"

Response: "Does this support any of the 13 clauses?"

Answer: No. Clause 3 specifies Form 300A only. Clause 13 limits scope to Form 300A.

Decision: Reject. Not in contract.

Scenario 2: Optimization idea

Idea: "We should cache intermediate results to improve performance."

Response: "Does Clause 4 require this?"

Answer: No. Clause 4 requires < 30 seconds for 50 incidents. Current implementation: 8 seconds. Already SAT.

Decision: Defer. Not needed for contract satisfaction.

Scenario 3: Ambiguous requirement

Question: "Should we support incidents from contractors?"

Response: "Does Clause 1 or Clause 2 specify this?"

Answer: Clause 2 references 29 CFR 1904, which covers employee incidents. Contractors are excluded unless meeting specific criteria.

Decision: Implement per CFR. Document assumption. Clause already specifies the standard.

The contract acts as a filter. Features not supporting clauses are out of scope.

Binary Verification: SAT or UNSAT

Each clause has a verification method. You test it. It's either SAT or UNSAT.

Example: Clause 4 (Performance)

```python

Test performance clause

def test_performance_clause(): incidents = load_test_dataset(count=50) # 50 incidents start = time.time() form = generate_form_300a(incidents) elapsed = time.time() - start

assert elapsed < 30, f"Performance clause UNSAT: {elapsed}s > 30s" print(f"Clause 4: SAT ({elapsed:.2f}s < 30s)") ```

Run test. If it passes, Clause 4 is SAT. If it fails, Clause 4 is UNSAT.

Example: Clause 6 (Auditability)

```python

Test auditability clause

def test_auditability_clause(): incident = "Worker fell from ladder, hit head, received stitches" result = classify_incident(incident)

assert "reasoning" in result, "Clause 6 UNSAT: No reasoning provided" assert "29 CFR" in result["reasoning"], "Clause 6 UNSAT: No CFR citation" print(f"Clause 6: SAT (reasoning and citation present)") ```

Binary outcome. No subjective judgment.

Example: Clause 13 (Scope Limitation)

```python

Test scope limitation clause

def test_scope_limitation_clause(): # Verify no support for Form 300 or Form 301 with pytest.raises(NotImplementedError): generate_form_300(incidents)

with pytest.raises(NotImplementedError): generate_form_301(incident)

print("Clause 13: SAT (only Form 300A supported)") ```

The contract explicitly limits scope. Test verifies that limitation.

How Contracts Change the Development Process

Traditional development:

Gather requirements (vague)
Build features (iterative)
Test (discover missing requirements)
Refine (scope expands)
Ship (when deadline forces it)

Contract-driven development:

Write contract (explicit clauses)
Verify clauses are complete (before coding)
Build to satisfy clauses (focused work)
Test clause verification (binary)
Ship when all clauses SAT (deterministic)

The difference: You know when you're done.

The Cost of Upfront Clarity

Contract-driven development isn't free. Trade-offs:

1. Requires upfront clarity

You must know what success looks like before coding. This is hard for exploratory projects where requirements emerge through iteration.

Reality check: For regulated systems (OSHA, finance, medical), requirements are defined by law. Upfront clarity is achievable.

2. Limits flexibility

Once the contract is set, changing it requires renegotiation. Can't easily pivot to a different approach mid-project.

Reality check: This is a feature, not a bug. Prevents scope creep and forces intentional decision-making.

3. No "MVP then iterate" approach

Contracts define done. You can't ship a partial solution and iterate. All clauses must be SAT.

Reality check: For compliance systems, partial solutions aren't viable. You're either compliant or you're not.

4. Hard to apply to vague problems

If you can't articulate success criteria, you can't write a contract.

Reality check: If you can't articulate success, you're not ready to code yet. Spend more time on problem definition.

When Contracts Work (and When They Don't)

Works for:

Compliance systems (regulations define success)
High-reliability systems (failures have real costs)
Fixed-scope projects (not open-ended research)
Regulated industries (finance, medical, safety)
Systems with clear correctness criteria

Doesn't work for:

Exploratory projects (requirements emerge through iteration)
Creative tools (success is subjective)
Rapidly changing markets (requirements shift constantly)
Systems with vague "make it better" goals

The OSHA tool was in the first category. Compliance-driven, high-reliability, fixed scope, regulated domain.

Different project, different approach.

Writing Good Clauses

Bad clause: "The system should be user-friendly."

Why bad: Not testable. "User-friendly" is subjective.

Good clause: "Users can generate Form 300A with zero training by following inline help text. Verification: 5 untrained users complete the task in < 10 minutes without external help."

Why good: Specific, testable, binary.

Bad clause: "Performance should be acceptable."

Why bad: "Acceptable" is vague.

Good clause: "Form generation completes in < 30 seconds for datasets up to 50 incidents. Verification: Benchmark test on 50-incident dataset."

Why good: Specific threshold, clear verification method.

Bad clause: "Code should be maintainable."

Why bad: "Maintainable" is subjective.

Good clause: "External reviewer can understand core logic without walkthrough. Verification: Developer unfamiliar with codebase reviews and identifies purpose of each module in < 2 hours."

Why good: Testable criterion, measurable outcome.

Contracts as Communication

The contract isn't just for developers. It's for:

Stakeholders: Know exactly what they're getting (and not getting) Developers: Know when to stop coding Testers: Know what to verify Users: Know what to expect

The OSHA tool contract was shared with beta testers before launch. They knew:

What the tool does (Form 300A generation)
What it doesn't do (Forms 300, 301, other compliance tasks)
How it handles edge cases (conservative bias, explains reasoning)

No surprises. No "I thought it would also..." complaints.

Contracts as Documentation

The contract documents decisions:

Why isn't Form 300 supported? See Clause 13: Scope limited to Form 300A.

Why is performance capped at 50 incidents? See Clause 4: Designed for small operators (< 50 incidents/year).

Why no cloud storage? See Clause 7: Security requirement prohibits storing sensitive data.

The contract is living documentation. Answers "why" questions without archaeology.

Enforcing the Contract

Code reviews: "Does this change satisfy a clause? Which one?"

If it doesn't satisfy a clause, it's not needed.

Testing: Every clause must have a verification test. If a test doesn't exist, the clause isn't verified.

Release criteria: All clauses SAT = ready to ship. Any clause UNSAT = not ready.

Binary decision. No judgment calls.

Evolution: When Contracts Change

Contracts can change, but changes are intentional and documented.

Example: Performance clause update

Original: "Generate Form 300A in < 30 seconds for up to 50 incidents."

User feedback: Some operators have 100+ incidents/year.

Decision: Update Clause 4 or reject request?

Analysis: Expanding to 100 incidents requires architecture changes (current approach uses in-memory processing). Cost: 2 weeks development + testing. Benefit: Supports 10% more users.

Decision: Update contract. New Clause 4: "Generate Form 300A in < 60 seconds for up to 100 incidents."

Why: Explicit trade-off analysis. Clear cost/benefit. Documented decision.

Example: Scope clause update

Request: "Can we add Form 300 support?"

Analysis: Clause 13 limits scope to Form 300A. Adding Form 300 requires:

New parsing logic (different fields)
New validation rules
New PDF template
Extended testing

Cost: 4 weeks. Benefit: Supports users needing both forms.

Decision: Reject for v1. Consider for v2 with separate contract.

Why: Scope creep prevention. Finish current contract before expanding.

Lessons for Architecture

1. Clarity is a constraint that enables speed

Spending 2 days writing the contract saves weeks of scope creep and rework.

2. Binary verification eliminates ambiguity

SAT/UNSAT removes "almost done" and "good enough" debates.

3. Contracts prevent emotional attachment to features

If a feature doesn't satisfy a clause, it's cut. No arguments about "nice to haves."

4. Upfront investment pays off in regulated systems

Compliance requirements are stable. Defining them upfront is cheaper than discovering gaps in production.

5. Contracts make projects finishable

Without a contract, projects expand forever. With a contract, they have a definite end.

6. Verification is architecture

If you can't verify a clause, you can't prove it's SAT. Design for testability from day one.

Summary

Contract-driven development trades flexibility for clarity.

You define success before coding. Each clause is binary: SAT or UNSAT. Work stops when all clauses are SAT.

For compliance systems and high-reliability projects, it's the disciplined approach that ships on time, prevents scope creep, and delivers auditable outcomes.

For exploratory projects with emergent requirements, it's overkill. Different problems need different approaches.

But if you can define done before you start, you should. The contract makes the path clear and the finish line real.

Frequently Asked Questions

What is contract-driven development? Writing explicit success criteria (clauses) before coding starts. Each clause is specific (no ambiguity), testable (can verify true/false), and binary (either SAT/satisfied or UNSAT/unsatisfied). Work stops when all clauses are SAT. Not 'mostly done' - done.

How does contract-driven development prevent scope creep? The contract acts as a filter. For each feature request: 'Does this support any of the clauses?' If no, reject - not in contract. For optimizations: 'Does the clause require this?' If already SAT, defer - not needed for contract satisfaction. Features not supporting clauses are out of scope.

What's wrong with traditional requirements? Four problems: (1) Ambiguity - 'should be fast' without defining how fast. (2) Scope creep - every 'should also add' expands scope. (3) 'Almost done' syndrome - 90% complete for months, last 10% reveals edge cases. (4) Unclear ownership - who decides if a feature is complete? Without objective criteria, completion is subjective.

When does contract-driven development work and when doesn't it? Works for: compliance systems (regulations define success), high-reliability systems (failures have real costs), fixed-scope projects, regulated industries. Doesn't work for: exploratory projects (requirements emerge through iteration), creative tools (success is subjective), rapidly changing markets, systems with vague 'make it better' goals.

How do you write good contract clauses? Bad: 'The system should be user-friendly' (not testable). Good: 'Users can complete task with zero training in <10 minutes.' Bad: 'Performance should be acceptable' (vague). Good: 'Form generation completes in <30 seconds for 50 incidents.' Bad: 'Code should be maintainable' (subjective). Good: 'External reviewer can understand core logic without walkthrough in <2 hours.'