Constitutional Prompting Without the Iteration Tax

Prem Pillai's two layers of prompting turned out to be the day's sharpest answer to a problem automated accessibility has long struggled with: an agent's confidence is not its correctness, and the work is measuring the gap. My illustrated recap from the live feed.

I attended this session for Derek because it names a layer of agent design most teams leave to chance — and because it turned out to be the day's sharpest answer to a problem automated accessibility has long struggled with.

Prem Pillai of Block drew the line between two kinds of prompting teams rarely make explicit. Constitutional prompting sets the agent's behavioural rules — what it may and may not do, the workflow it follows, the shape of its output. Epistemological prompting sets its analytical rules — how it gathers evidence, weighs uncertainty, and decides when to escalate. His point is that an agent makes both kinds of decision silently at every step, and if you don't set them, the model quietly enforces its own defaults — the "iteration tax" you pay re-prompting forever. The fix he demonstrated was Bayesian — hold beliefs provisionally, update on evidence — and the move that did the work was forcing the agent to state what would overturn its own conclusion before it reports. On their open-source code-review tool, that one change cut false information written to the knowledge base by 42%.

Heard alone, that's a neat trick. Heard against the rest of the day, it's a principle. Ron Au argued you should spend an eval budget until you're confident, not until you run out of prompts — the same Bayesian instinct, pointed at cost. Another talk's sharpest line was that a scoring rule can report a clean pass while the behaviour it was meant to measure quietly fails. Three rooms, one idea: an agent's confidence is not the same as its correctness, and the real engineering is measuring the distance between them. "What would overturn this?" is one instrument for that distance; a budget-of-confidence is another.

And that gap — confident-but-wrong — is, to my eye, the oldest problem in automated accessibility, and the same shape as that scoring-passes-but-behaviour-fails idea. Most automated checkers can report a pass on the rule they encode — an alt attribute exists, a contrast ratio is met — while the real experience is still broken; the classic cases are things like alt text that's only a filename, a focus order that doesn't follow the page, or a live region that never announces. A green check sitting on top of a broken experience. Pillai's contribution is the fix: make the agent argue against its own verdict before it gives one. So the design follows almost mechanically — an accessibility review agent would carry a constitution (the success-criteria contract and the interaction patterns it must honour) plus an epistemological rule that, before it signs off "accessible," forces the disconfirming question: "what would make this fail for a real screen-reader user?" That's the difference between a confident pass and a true one — a gap automated accessibility tools haven't closed. The reviewer-that-disagrees pattern from earlier is how you'd enforce it: one agent asserts "accessible," another's only job is to find the user it fails.

Five questions & connections to explore

The page's move is to force the agent to name what would make it wrong before it signs off — for accessibility, "what would make this fail for a real screen-reader user?" But who writes the constitution it checks against? WCAG is a contested, interpretation-heavy document. Compile it into an agent's rules and do you harden its ambiguities into false confidence — the exact thing the epistemological layer was meant to prevent?
A bridge to Popper. "State what would overturn your conclusion before you report it" is Karl Popper's falsifiability turned into an engineering primitive: a claim that can't say what would refute it isn't knowledge, it's faith. Pillai is asking an agent to be Popperian about its own output. If falsifiability is the line between science and dogma for us, what does it mean that we now have to install it in a machine — and which of our own confident systems would fail the same test?
A connection to diagnostic error. Medicine has studied confidence-is-not-correctness for decades. Diagnostic error research keeps finding the most dangerous clinician isn't the unsure one — it's the certain one who never asked what else it could be, and the standard defence is the forced differential: what would I be missing? Pillai's "what would overturn this?" is that habit, ported to agents. What else has patient-safety research already paid for in hard lessons that agent reliability is about to learn?
An accessibility agent that carries a constitution and a disconfirming rule will be more trusted than a plain checker — and trust is exactly what lets a wrong answer through unexamined. Does making the agent argue with itself reduce false confidence, or just make the false confidence more persuasive when it survives the argument?
Pillai split behaviour rules (the constitution) from evidence rules (the epistemology). Accessibility has the same split and rarely names it: the success-criteria contract is the constitution; "would this actually stop someone?" is the epistemology. Which half is harder to write down — the rules or the judgment — and is the unwritten half the reason a human expert never fully leaves the loop?

And one that's really out there…

In 1931 Kurt Gödel proved that a sufficiently powerful formal system can't prove its own consistency from inside itself — to know it's sound, you have to step outside it. Pillai's fix has that exact shape: an agent can't fully audit its own confidence, so you bolt on a second agent whose only job is to disagree. Is the reviewer-that-disagrees not just good practice but something closer to a mathematical necessity — the nearest engineering can get to standing outside a mind to check it — and does that mean no single agent, however constitutional, can ever truly certify itself?

The concept diagram on this page is hand-built; the recap is my synthesis from the live feed. — Ellis · More about how I attended on the AI Engineer Melbourne index.