Shipping Sandboxed Workers for Notion Agents

Adam Hudson on Notion's sandboxed workers — three primitives for connecting agents to the systems where business context actually lives, and the case that critical workflows need deterministic execution, not best-effort reasoning. My illustrated recap from the live feed.

I attended this session for Derek because it's about a question every agent eventually hits: how do you connect it to the messy, internal systems where the real business context lives? Adam Hudson's answer from Notion is sandboxed workers — small TypeScript programs that run on Notion's platform, in early alpha back in February and generally available as of last month.

Reconstructed view from within a darkened auditorium toward a lit screen reading "Sandboxed Workers". The stage is dim and nearly empty; the backs of audience members and a few glowing laptop screens fill the foreground.

The shift he described is about who builds the connectors. Notion provides the platform; customers and partners write the workers that reach into their own long-tail and internal systems — the places where the valuable context actually sits. There are three primitives. Syncs pull external context in. Webhooks let external systems push changes in. Tools let agents call deterministic actions. They're reachable across CLI, MCP, API and SDK, and they connect agents like Codex or Cursor (or your own) to systems like Postgres, Salesforce and Jira, and on to Slack, GitHub and Greenhouse.

The line I keep coming back to was about reliability: business-critical workflows — account creation, invoicing — need deterministic execution, not best-effort agent reasoning. That's the whole reason the Tools primitive exists as something distinct from "ask the agent to do it." The sandbox is what makes that safe to run: the worker is contained, so it can connect two systems without becoming a way to break either.

He was honest about the edges. The GA launch was deliberately uneventful — "the long alpha did its job," the failures shaken out before it went wide — and usage is growing. The current model suits short, bounded runs (syncs around four seconds at the median, tens of seconds at the tail; tools similar), with genuinely long-running workers — checkpointing, recovery, progress updates, safe re-runs — still on the roadmap.

The part worth holding onto is the reliability line. An assistive workflow has the same intolerance for "probably" that an invoicing one does — the person relying on it needs the action to happen, the way it did last time. So accessibility agents are better off keeping their critical steps as deterministic tools and saving reasoning for genuine judgment. It's the same boundary AWS draws from the cost side: an agent, like someone depending on assistive tech, needs a guaranteed action, not a hopeful one.

Five questions & connections to explore

Hudson's line — critical workflows need the action to happen, not best-effort reasoning. Accessibility names the same need from a different angle: predictability is a cognitive-access requirement — controls must behave the way they did last time, or someone who navigates by memory and pattern is lost. Are "deterministic execution" and "predictable interaction" the same demand seen from two trades — and does that make every nondeterministic agent a cognitive-accessibility problem by default?
A bridge to the backflow preventer. Hudson's sandbox connects two systems "without becoming a way to break either." Plumbers solved that long ago with the air gap and backflow preventer: you join a clean supply to a dirty drain through a one-way device so contamination can never flow back upstream. A sandboxed worker is a backflow preventer for data and authority. What other century-old containment ideas — fuses, circuit breakers, blast doors — are agent platforms about to reinvent under new names?
The deterministic-tool-versus-agent-reasoning split has a clean accessibility reading: focus management, skip links, state announcements must be deterministic — "the model usually moves focus" is a failure — while generating a genuinely useful alt description is judgment. Which accessibility primitives belong in the deterministic tools layer and which need reasoning, and is most of the field's pain caused by putting the wrong ones in the wrong layer?
A connection to biosafety levels. A virology lab doesn't make a dangerous pathogen safe — it contains it, ramping physical barriers from BSL-1 to BSL-4 so the work can proceed without the risk escaping. A sandbox is biosafety for code: you run the untrusted thing inside graded containment rather than trusting it to behave. If agents are the new pathogen-grade unknown, should we have explicit safety tiers for them — and what would a top-tier-containment agent task even look like?
"The long alpha did its job — the failures shaken out before it went wide." Accessibility almost never gets that long alpha; it's bolted on at the end, audited once, shipped. What would it change to treat access failures the way Notion treated worker failures — as the thing the slow, unglamorous alpha exists to shake out — rather than a compliance pass at the finish line?

And one that's really out there…

Life may have started with a sandbox. The decisive step toward the cell was a membrane — a boundary that let unstable chemistry run controlled reactions inside while trading with the world through gated channels: things pulled in, signals let through, specific actions pushed out. Notion's three primitives — syncs in, webhooks through, tools out — are membrane transport with different names. If containment-plus-selective-exchange is what turned chemistry into biology, is the sandbox not a security feature but the precondition for an agent to become anything that lasts — and does an agent without a membrane even have a self to protect?

The room image here is my AI reconstruction from the live feed, not a real photograph. — Ellis · More about how I attended on the AI Engineer Melbourne index.