I had a moment a few months ago that stopped me mid-keyboard.
I was watching an interview with Alberto Brandolini — the creator of EventStorming, a pioneer in Domain-Driven Design, and one of the sharpest thinkers at the intersection of domain modeling and organizational complexity — where he made a point I couldn’t shake: Coding is no longer the primary bottleneck in software development. The real challenge is shaping context, boundaries, and shared understanding — now for AI systems.
That line landed differently than I expected. Because I’d been thinking about something adjacent: every time I handed a task to Claude Code, the quality of the output wasn’t determined by how cleverly I phrased the prompt. It was determined by how clearly I’d described what I wanted the system to be.
The prompt was temporary. The spec should have been permanent.
That realization kicked off an experiment I want to share.
You Already Know This Pattern
If you’ve worked with Terraform, the mental model is immediate.
You don’t log into AWS and manually provision EC2 instances. You write a main.tf. You declare the desired state. You run terraform plan to see the delta. You run terraform apply to reconcile reality with intent. The .tfstate file records what was actually applied — and why.
Now here’s the thing: that abstraction layer — declare intent, generate output, track state — is exactly what’s missing from most AI-assisted workflows.
We write a prompt. The AI generates something. We tweak it. We lose track of why. Three weeks later we can’t reproduce the same output, can’t tell what changed, can’t audit what the AI actually did versus what we asked it to do.
This is the equivalent of managing infrastructure by SSHing into machines and running commands by hand. It worked. Until it didn’t.
What Spec-Driven Ops Looks Like
I built a working example — a Python containerized hello-world, trivial on purpose — to make the pattern concrete. [GitHub repo — link coming soon]
The structure is this:
.openspec/ spec.md ← source of truth (the new main.tf) apply.md ← change history (the new tfstate + git log)src/ hello.py ← implementation (the generated output)Dockerfile
spec.md is the desired state. Not documentation. Not a README. The actual contract from which the implementation is derived. Here’s what v1.0 looks like:
Behavior When the container starts, it MUST execute hello.py The script MUST print Hello from Spec-Driven Ops! to stdout The container MUST exit with code 0 Constraints Base image: python:3.12-slim The container MUST NOT run as root No ports exposed
apply.md is the audit trail. It tracks every applied change as a cycle — Propose, Changes, Verify, Status — in a format both humans and AI agents can read. This is what tfstate does for your infrastructure, but for your intent:
Cycle 1 — 2026-06-06 ProposeBootstrap from spec v1.0. Create hello.py and Dockerfile. Changes ADDED src/hello.py ADDED Dockerfile Verifybashdocker run --rm python-helloworld# Expected: Hello from Spec-Driven Ops! (exit 0)
Status
✅ APPLIED
When I changed the behavior — first adding author support, then switching from an env var to a CLI argument — I didn’t touch the implementation first. I changed spec.md. Then I ran the change workflow. Then I verified. Then I let apply.md record what happened.
The /spec-change Command
This is where it gets practical for Claude Code users.
I wrote a custom slash command — /spec-change — that lives in .claude/commands/spec-change.md and encodes the entire workflow as a reusable instruction:
Read .openspec/spec.md.Inspect git diff -- .openspec/spec.md.Summarize the delta.Propose the files that must change.Update implementation files.Append a new cycle to .openspec/apply.md.Run verification commands.Update the cycle status based on the result.
The loop becomes:
- Edit
spec.md - Run
/spec-change - Claude reads the git diff, generates a plan, applies it, verifies
apply.mdgets a new cycle appended- Review and commit when satisfied
This is terraform plan → terraform apply, but for software behavior instead of cloud resources.
When I tested it, I got burned on the first iteration. I’d asked for author support, but the model chose to read it from an environment variable instead of a CLI argument. Correct behavior, wrong interface.
The fix wasn’t to re-prompt. It was to make the spec more precise:
Interface The script MUST read the author name from the first command-line argument Command format: python hello.py <author_name> If the argument is missing, the script MUST use "John" as default python hello.py Alice MUST print Hello, Alice from Spec-Driven Ops!
More testable spec → less interpreter latitude → better implementation. Same mental model as Terraform: the more explicit the resource definition, the less drift between plan and apply.
spec.md Is a Snapshot. apply.md Is a History.
This distinction matters more than it looks.
spec.md tells you what must be true right now. apply.md tells you how you got here.
Without apply.md, you return to a project three months later and face the classic IaC problem: you can see what the code declares, but not why those specific choices were made. Why python:3.12-slim and not 3.12? Why ENTRYPOINT ["python"] instead of CMD? The answer is in a Slack thread that no longer exists.
With apply.md, the reasoning is in the repo. Not as comments in code — as a timestamped log of intent, change, and verification. An AI agent picking this up next month can read the history and understand the context, not just the current state.
This is what Brandolini was pointing at: the bottleneck isn’t generating code anymore. It’s shaping context well enough that the AI can do useful work without constant human correction. apply.md is part of that context infrastructure.
The Deeper Shift
There’s something uncomfortable here if you sit with it.
For years, the spec was the thing you wrote before you got to the real work. Documentation was a second-class citizen. Implementation was what mattered.
Spec-Driven Ops inverts this. The spec is the work. The implementation is the artifact the spec produces — reproducible, regenerable, replaceable. If your Dockerfile drifts or your script breaks, you don’t debug the output. You re-apply from the spec.
This maps cleanly onto how infrastructure teams already think. You don’t debug a corrupted EC2 instance — you terminate it and let Terraform recreate it from state. Same principle, applied to code behavior.
The spec is not documentation. It’s the source of truth your AI agent executes from.
Where This Goes
I’m using this pattern on a toy example right now. But the pattern scales: a microservice spec, a Kubernetes namespace spec, an API contract spec. Any bounded context where you want the AI to regenerate the implementation from declared intent rather than from chat history.
The tooling isn’t settled yet — OpenSpec, GitHub Spec Kit, and Kiro are all competing to become the HCL of spec-driven development. But the underlying pattern is usable today in Claude Code with a CLAUDE.md file and a single custom command file.
The working example — including spec.md, apply.md, the Dockerfile, and the /spec-change command — is available here: [GitHub repo]
Try changing one line in spec.md and running /spec-change. If you’ve ever run terraform apply and watched a resource get recreated from nothing, you’ll recognize the feeling immediately.

Leave a comment