We instrumented 90 'fix until green' agent loops. Here's what they waste.

The loop-engineering wave is real, and so is the spend. We ran 90 real fix-until-green agent loops with LoopGain watching the failing-test count. The convergence math held perfectly at session scale — zero incoherent classifications in 90 loops — and it stops cleanly when sessions have budget (30/30). But the stop rule has an honest bug: on a hard, budget-tight cell it false-stopped 13 of 30 times, 9 of them one session before the fix would have landed. We traced it to a hardcoded rule in our own core, measured the savings honestly (big vs a naive loop, modest vs a smart one), and caught ourselves nearly publishing a third ‘finding’ that was really a test artifact. Here’s all of it.

June 12, 2026 · 15 min · 

We ran 2,000 paired agent-loop trials. Here's what surprised us.

Our benchmark headline is a 92.8% cost reduction. The useful part is everything that didn’t fit the headline: the state we built to catch oscillation mostly catches stalls, the savings are loaded toward easy cases, and on normal workloads we mostly preserve quality rather than improve it. Five honest surprises.

June 7, 2026 · 9 min · 

Get new posts by email

New writing on agent-loop control, benchmarks, and what we keep learning the hard way. At most one email per post.

Double opt-in — we'll email you to confirm. Unsubscribe any time.