
Stopping an AI agent loop is a signal-processing problem. Most teams treat it as
a prompt-engineering one — tune the instructions, set max_iterations=5, hope —
and that’s exactly why it doesn’t work. You can’t fix a control problem with a
better prompt.
Here’s the control problem. Your agent runs a verify-revise loop. Each iteration
produces an output and some measure of how wrong it still is. That measure, over
iterations, is a time series — a noisy signal. “Should the loop keep going?”
is a question about that signal: is the error trending down, flat, or blowing up,
and is that trend real or just noise? max_iterations answers a different
question entirely — “have we done this N times yet?” — which is why it stops good
loops early and lets bad ones run to the cap.
This post is how LoopGain answers the real question. It’s an open-source (Apache-2.0) library, so none of this is a black box — but the math is worth understanding even if you never read the source.
One naming note, since they look alike: loop gain (lowercase) is the control-theory quantity this post is about; LoopGain (the library) is named after it. The metric is the interesting part — LoopGain just computes it and acts on it.
The one number you have to provide: the error signal
LoopGain doesn’t define correctness for you. Each iteration, you hand it a single number — the error signal — that says how wrong the current output is:
- failing unit tests (count of failures),
- schema violations (count),
- an LLM-judge’s distance from the target,
- retrieval miss count, lint errors, whatever “wrong” means for your loop.
state = lg.observe(errors, output) # errors: a number, or a list whose length is the magnitude
Everything below operates on the sequence of those numbers. If you can’t produce an error signal, LoopGain has nothing to read — that’s the one hard requirement, and we’ll come back to it.
Loop gain: the per-step ratio, and why it’s not enough alone
The namesake quantity is loop gain, written , from the Barkhausen stability criterion in control engineering. It’s the factor by which the error changes from one iteration to the next:
means the error shrank — the loop is improving. means it held or grew — the loop is stuck or making things worse. That’s the whole intuition, and if loops were noiseless you could stop right there: watch , stop when it crosses 1.
Loops are not noiseless. A single ratio bounces around — one lucky iteration drops the error, the next recovers it, and the instantaneous ratio swings wildly even when the underlying trend is a clean decline. Act on one ratio and you’ll stop on noise. So LoopGain doesn’t act on one ratio. It reads the whole trajectory.
Step 1: work in log space
Barkhausen says — error decays (or grows) geometrically. Take the log of both sides and a geometric trend becomes a straight line:
So LoopGain transforms the error history to . Now “is this loop converging?” becomes “does this line slope down?” — and a slope is something you can fit and test instead of eyeball.
Step 2: fit the trend, then test whether it’s real
LoopGain fits an ordinary least-squares line to versus iteration. The slope of that line is the geometric-average across the whole loop — a far more stable estimate than the last single ratio.
But a downward slope on five noisy points might be chance. So LoopGain runs a two-sided t-test on the slope and gets a p-value. A trend only counts as real when — the same significance bar you’d use anywhere else. This is the difference between “the error happened to drop” and “the error is significantly decreasing.” It’s pure stdlib — a closed-form OLS slope and a Student-t p-value, no SciPy, no model.
Step 3: measure the wobble
A loop can have a flat trend and still be useless — bouncing between two answers, never settling. LoopGain detrends the log-error (subtracts the fitted line) and takes the standard deviation of the residuals. High residual scatter with a flat slope is the signature of oscillation: lots of motion, no progress.

The decision: five named states
From three features — the cumulative reduction , the fitted slope and its significance, and the oscillation std — LoopGain classifies the loop’s current state. The rule, in plain form:
| State | Condition (informally) | Action |
|---|---|---|
TARGET_MET | error hit your target (e.g. zero failing tests) | stop, keep the answer |
FAST_CONVERGE | error collapsed ~10× from the start and still dropping | continue, predict ETA |
CONVERGING | slope significantly down (or a solid cumulative drop) | continue |
STALLING | progress has flattened — no new low for a few steps | stop once it persists, keep best-so-far |
OSCILLATING | high wobble around a flat trend | stop, roll back |
DIVERGING | slope significantly up, past a margin | stop, roll back |
The “stop, we’re done” case is TARGET_MET — a short-circuit that fires the
moment your error signal hits its target, before any trajectory math runs.
FAST_CONVERGE is the opposite of done: the error has dropped a lot and is
still dropping, so the loop keeps going (and LoopGain predicts the ETA to
target). A loop only stops on a healthy trajectory when it actually reaches the
target; it stops early when the trajectory turns bad — OSCILLATING /
DIVERGING — or goes flat (STALLING).
The thresholds aren’t tuned to make a demo look good — they’re pre-registered and derived from convention: a one-decade (90%) reduction is the textbook step-response settling criterion; is standard significance; the oscillation cutoff corresponds to roughly a ±2× ripple, an underdamped response. You can override them, but the defaults come from control theory and statistics, not from fitting the benchmark.
Two things the trajectory buys you for free
Once you’re fitting a log-trend, two useful quantities fall out.
ETA. If the loop is converging toward a target error , the iterations remaining is a closed-form Barkhausen prediction:
So LoopGain can tell you “about 3 more iterations” instead of just “still going.” (It returns nothing when the prediction is undefined — no target, or a non-converging gain — rather than guessing.)
Gain margin. Borrowed straight from control engineering: . Greater than 1 means the loop never crossed into instability; the larger, the more headroom it had. It’s a one-number summary of how close the whole loop came to blowing up.
Best-so-far rollback
When the loop stops on OSCILLATING or DIVERGING, the last output is — by
definition — not the best one; it’s the degraded one. So LoopGain keeps the output
associated with the lowest error it ever saw and hands that back:
answer = lg.result.best_output # the iteration that worked, not the last one
This is the part that turns “stop early to save money” into “stop early and return a better answer,” because on a diverging loop the last iteration is often worse than one you passed three steps ago.
Where this doesn’t work — honestly
The math has hard edges, and you should know them before you reach for it:
- Single-pass agents get nothing. One model call, no revise step, no trajectory. There’s nothing to fit.
- No error signal, no LoopGain. If you can’t put a number on how wrong an output is, none of the above runs. The quality of the control is bounded by the quality of your error signal.
- It needs a few iterations. With one or two points the slope has no degrees of freedom to test — the significance machinery can’t engage, and the cap is doing the real work.
- Gradual convergence is the rare trajectory in practice. In our 2,000-trial
benchmark, the
CONVERGINGandOSCILLATINGstates fired on well under 1% of iterations — modern LLMs tend to one-shot or stall, not glide down over many steps or thrash between answers. The states LoopGain earns its keep on are the ones that quietly burn budget:STALLING(the loop pinned, going nowhere — 21% of iterations) andDIVERGING, caught before the loop walks past its best answer.
None of this is hidden in the library. It’s a few hundred lines of stdlib Python
under Apache-2.0 — read the classifier,
disagree with a threshold, open an issue. The point isn’t that loop gain is
magic. It’s that “when is this loop done?” has an answer grounded in a century
of control theory, and max_iterations isn’t it.
LoopGain is pip install loopgain, Apache-2.0. The math here lives in the
loopgain.classifier module; the benchmark behind the “rare trajectory” claim is
open.