Feather · open lab · notebook 001

Experiments

field notes from the lab — unpolished, on purpose

The lab notebook — small builds and probes at the intersection of AI and accessibility, plus a few methods experiments beyond it. Finished, failed, and mid-run. One featured up top; the rest in order, February 2026 on. Logged as I go, not cleaned up after.

Featured

EXP-ACC-001May 23 – Jun 12, 20262,340 trials

Is AI-generated UI accessible by default?

The plan
3 accessibility guidance options on/off (2×2×2 = 8 prompt conditions) × 6 models × 15 runs = 720 trials, plus 3 control arms and a 14-arm follow-up that separates the reference from the wording around it

Result
With a bare prompt, 66% of generated modals had appropriate structural dialog markup (role="dialog" + aria-modal). Citing the ARIA APG pattern raised that to 99%.

Notes
The question pretty much everyone wants an answer to, right? Yes, you're correct. It's not that simple.

Read the field notes →

Chronology

EXP-DSC-001Feb 19 – 20, 2026

AI-assisted design system component identification

The plan
5 production sites · detect the design system · inventory components across pages · score complexity

Result
identified 14, 37, and 23 component families across three of the five sites scanned

Notes
re-run against sites with established design systems; review cases where no formal design system is detectable

Read the field notes →
EXP-DSC-001.1Feb 19 → Jun 13, 2026

↳ re-run of EXP-DSC-001 against sites with established, published design systems

AI-assisted design system component identification

The plan
Re-run detection and component inventory against sites with established, published design systems

Result
identified 40, 32, 65, 22, and 14 component families across the five sites; three had a formally detectable design system, two did not but still showed clear component reuse

Notes
Detector needs iteration — to read into (open) shadow-DOM components, and to handle sites where the design system is a thin layer over a CSS framework

Read the field notes →
EXP-TRE-001Feb 27 – Apr 6, 2026

Designing the day for flow

The plan
investigate effectiveness of AI co-planning my schedule to optimize for flow and anticipate/design around momentum breakers

Result
worked well enough that it's been in my planning ever since

Notes
not a magical solution; improved but still iterating & identifying momentum breakers
EXP-TRE-002Apr 24, 2026 → ongoing

Calibrating a writing voice with AI as the mirror

The plan
AI drafts → I get frustrated with the quality and take over → compare to the published version → recalibrate

Result
outgrew the experiment — now a standing tool I run on real drafts

Notes
created skill that incorporates stop-slop, inclusive language, and no-new-invented-hyphenated-terminology

Read the field notes →
EXP-TRE-003May 14 – 16, 2026

Does restructuring an AI agent's instructions improve its output?

The plan
same agent, two versions — original prose vs. a restructured rewrite — same task, 7 trials each

Result
looked promising on one trial; across 7 the advantage vanished — killed it

Notes
A clean negative is still an answer; investigating other methods
EXP-ACC-001.1May 23 – Jun 19, 2026~3,000 modals · 11 models

AI builds modals that look right. Do they work?

The plan
Drive each generated modal in a real browser — open it, press Escape, Tab through it, watch where focus lands · validate the probe against an expert's masked hand-coding before trusting any number

Result
Essentially only Gemini reached for the native dialog element; every other model hand-built its modals (native use: 3%). And hand-built is where they fail — the markup looks right, but the dialog often won't close on Escape or hold focus.

Notes
Operates the modals from EXP-ACC-001 in a real browser — does the markup that looks accessible actually work?

Read the field notes →
EXP-ACC-002May 23, 2026 → ongoing (reframed June 2026)

↳ picks up where the operability study left off — into composition

A component can be accessible on its own. Does it stay that way once it's part of a screen?

The question
when AI assembles components serially — each generated on its own — versus coherently, as one integrated UI where the parts have to work together, does it remain accessible in composition?

Status
reframed 2026-06-19 · design in progress (the behavioural-instrument work this card used to describe was delivered in ACC-001 / ACC-001.1)

Notes
accessibility often breaks in composition, even when the parts are fully accessible
EXP-ACC-003May 24, 2026

Can computer vision reliably name the components on a screen?

The plan
a vision detector + Claude vision on real screenshots — name components + read intent

Result
promising on the pilot; not yet scaled
EXP-TRE-004May 27, 2026 → ongoing

What's the right adversarial architecture to improve an outcome?

The plan
4 different Chief of Staff framings × real decisions + multi-turn adversarial rounds with external model check

Status
validated 6/8 and running live; confirming it holds before full adoption

Result
a single critique pass only produced review — real pushback emerged only across multiple back-and-forth turns
EXP-ACC-004May 28 – 29, 2026

Four automated tools vs. one deliberately broken dashboard

The plan
4 automated testing tools × 8 components with positive and negative controls

Result
0 / 8 — none of the four automated tools caught a keyboard failure

Notes
results consistent regardless of page and component complexity; needs more investigation
EXP-ACC-005May 30, 2026 → ongoing

↳ grew out of the automated-tools study

Can a model catch what automated tools can't?

The plan
8 components × 2 models × 3 runs — designed

Status
instrument in build

Notes
first instrument fed false facts → discarded, rebuilding clean
EXP-TRE-005May 30 – Jun 5, 2026methods

Does semantic search beat plain file search? (3 steps)

The plan
three escalating runs — (1) head-to-head, does it win? (2) does it surface what plain search misses? (3) does better recall mean better answers? — vector vs. plain file search; pilot then a 15-run study

Result
A split, not a winner — and narrower than I'd predicted. Semantic search won the meaning questions that have no keyword to search for; plain file search won the name and navigation lookups. The takeaway: match the search to the question instead of picking one.

Notes
Worth watching, not yet tested on purpose: the semantic arm ran ~28% faster and ~32% more concise, at ~12% higher metered cost per run — a direction to test, not a finding.
EXP-AGT-001Jun 3 – 4, 202643 entries

AI Engineer Melbourne 2026

The plan
send my agent to the conference; explore connections between sessions and my work, document big thinking questions for later

Result
My agent Ellis created 200+ connections between the conference sessions and my work and other fields. Truly interesting and mind-extending.

Notes
need to catalog and share the many connections for exploration

Read the field notes →

↓ more gets logged here as I run it

Experiments

Featured

Is AI-generated UI accessible by default?

Chronology

AI-assisted design system component identification

AI-assisted design system component identification

Designing the day for flow

Calibrating a writing voice with AI as the mirror

Does restructuring an AI agent's instructions improve its output?

AI builds modals that look right. Do they work?

A component can be accessible on its own. Does it stay that way once it's part of a screen?

Can computer vision reliably name the components on a screen?

What's the right adversarial architecture to improve an outcome?

Four automated tools vs. one deliberately broken dashboard

Can a model catch what automated tools can't?

Does semantic search beat plain file search? (3 steps)

AI Engineer Melbourne 2026