Spec vs. Prompt: A Founder's Diagnostic

Most founders assume that when the AI output is wrong, a better prompt will fix it. Sometimes it does. More often, the problem is upstream: the system doesn't know what it's supposed to produce. Prompting harder moves the problem around without solving it.

Answer each question honestly. Score 0 if yes, 1 if no or unsure. Add them up at the end.

The five questions

1. Can you write down in one paragraph — without referencing the AI — exactly what the output should look like?

What does it contain? What doesn't it contain? What format? What does "good" look like versus "bad"? If you find yourself wanting to describe what the AI does rather than what the output is, score 1.

2. If the AI received perfect input, would you know within 30 seconds whether the output was right or wrong?

Not whether it was great — whether it was correct. If evaluation requires judgment you haven't defined anywhere, score 1.

3. When you change the prompt, does the problem disappear — or does it move?

"Fixed" one failure mode, exposed a different one? The problem moved. Score 1.

4. Can you describe specifically how the AI fails when it's wrong — not just "it's wrong," but the pattern of wrongness?

"It's wrong in the same way on inputs that have X characteristic" is a failure mode. "Every failure feels different" is not. If you can't name the pattern, score 1.

5. Does the AI know what it's not supposed to produce?

Has the system been designed for out-of-scope inputs, ambiguous requests, and edge cases — or does it always produce something, even when it shouldn't? If it always produces, score 1.

Scoring

0–1: Prompt problem. The spec is clear. Iterate on phrasing, examples, or model selection.

2–3: Probably a spec problem. You've asked for approximately the right thing, but not precisely enough. Write the spec first, then return to prompting.

4–5: Spec problem. You haven't defined the output well enough for a human to produce it reliably — so the AI can't either. Write the spec: what the output contains, how to evaluate whether it's correct, what to do on edge cases. Then come back to the AI.

The agentic products that compound over time are built on specifications, not prompts. Specs are testable, revisable, and shareable. Prompts are ephemeral. If you scored 3 or higher: write the spec first.