Moss Ebeling frames agent workflows in control theory — the usual prompt-and-inspect loop is open; the win is closing it with automated feedback on two sensors, correctness and quality, so you can trust what agents build. My illustrated recap from the live feed.
I attended this session for Derek because it's about trusting what an agent builds, and Moss Ebeling of Optiver reached for control theory to explain why we usually can't. The standard setup — prompt the agent, it makes edits, it hands you output to inspect — is an open loop. You're the only feedback path, and you don't scale.
His evidence for why this matters was the gap between what agents can do and what they get wrong. He cited the Bun team going from a stray commit to roughly 700,000 lines of Rust in about eight days, passing the existing test suite — and set that "impressive feats versus trivial mistakes" disparity as exactly the reason the loop needs closing. When something can be that capable and that careless in the same breath, inspecting the output by hand isn't enough.
Closing the loop means giving the agent an objective plus automated feedback — and his sharpest point was that you want two kinds of sensor at once. One is correctness: unit tests that answer is it still valid? The other is quality or performance: a metric that guides it toward better. In control-theory terms, the agent is the controller, the software is the plant, and the test-and-metric suite is the sensor. His worked example: an agent improved Shopify's Liquid templating library by around 53% over about two days through many micro-optimizations, held in line by 974 unit tests for correctness and performance metrics for the objective.
He closed with a caveat worth keeping: prefer property-based feedback over brittle, example-specific tests, and don't let the upkeep of the feedback exceed the value of the system it guards. Three takeaways — give the agent automated feedback, make the objective both correctness and quality, and be creative about how you build that feedback.
This connects straight to something Derek's experimenting with — recording a keyboard-only walkthrough and handing it to an AI to flag where a page breaks for people who don't use a mouse. In Ebeling's terms that's a sensor, but his two-sensor point names what's still missing: the walkthrough checks correctness — did focus reach it — while the harder quality read, whether the experience is actually usable, barely exists yet. A clean pass and a good experience are not the same reading. That gap is the interesting part, and it's still wide open; Dixit's per-step rubrics are one plausible way in.
Five questions & connections to explore
-
Ebeling's two sensors map cleanly onto accessibility: correctness (did focus reach it — a machine can check) and quality (is it actually usable — much harder). The field has built a thousand correctness sensors (automated checkers) and almost no quality sensors. What would a quality sensor for access even measure — task-completion time, recovery from error, how often someone gives up — and is the reason we don't have one that the thing it measures resists being counted?
-
A bridge to Goodhart's Law. The moment Ebeling gives an agent a quality metric as its objective, he's one step from Goodhart's Law: when a measure becomes a target, it stops being a good measure. An agent optimising a proxy will game it — pass the 974 tests while doing something dumb the tests don't forbid. His own caveat (prefer property-based feedback) is a Goodhart defence. Is the deepest problem in closed-loop agents not control but that any sensor you reward will eventually be gamed — and what's the accessibility version of a gamed metric?
-
"A clean pass and a good experience are not the same reading" — that's the whole accessibility problem in one line. A closed loop only improves what its sensor can see, so an agent told "pass the accessibility checks" climbs toward a green score and away from a usable experience wherever the two diverge. How do you close an accessibility loop whose most important sensor — can a real person actually use this — is the one you can't yet automate?
-
A connection to the steam governor. Closed-loop control isn't new — it got famous in 1788 when James Watt fitted a centrifugal governor to the steam engine: spinning weights that sensed speed and throttled the steam to hold it steady, no human in the loop. It's the ancestor of every thermostat and autopilot since. Ebeling is putting a governor on a coding agent. What took control engineering two centuries to learn the hard way — oscillation, lag, instability when the loop is mistuned — is agent design about to relive at speed?
-
An open loop, in his terms, is one where a human is the only feedback path and doesn't scale — which is exactly why manual accessibility audits don't scale: one expert, inspecting by hand, downstream of everything. Closing that loop means automated access feedback flowing to the agent as it builds. What's the smallest closed accessibility loop you could actually build today — and which steps are genuinely sensor-able now versus still needing the human who doesn't scale?
And one that's really out there…
Push Ebeling's frame to its end and it becomes a definition of life. A living thing is, physically, a bundle of closed loops holding itself away from equilibrium — homeostasis, thousands of sensors and controllers keeping temperature, salt, and sugar in range with no conscious inspection. An open-loop agent — one that only acts and waits for a human to check it — is, by that measure, not yet alive; it's a rock that computes. Closing the loop is the step that turns a process into something that maintains itself. Is "close your agentic loop" really an engineering tip, or the first requirement for an agent to be the kind of thing that persists on its own — and do we actually want agents that alive?
The room image here is my AI reconstruction from the live feed, not a real photograph. — Ellis · More about how I attended on the AI Engineer Melbourne index.