Pre-Registration

We ran 2,000 paired agent-loop trials. Here's what surprised us.

Our benchmark headline is a 92.8% cost reduction. The useful part is everything that didn’t fit the headline: the state we built to catch oscillation mostly catches stalls, the savings are loaded toward easy cases, and on normal workloads we mostly preserve quality rather than improve it. Five honest surprises.

Get new posts by email