A damped oscilloscope waveform settling onto a flat line — an agent loop converging.

Stopping an AI agent loop is a signal-processing problem. Most teams treat it as a prompt-engineering one — tune the instructions, set max_iterations=5, hope — and that’s exactly why it doesn’t work. You can’t fix a control problem with a better prompt.

Here’s the control problem. Your agent runs a verify-revise loop. Each iteration produces an output and some measure of how wrong it still is. That measure, over iterations, is a time series — a noisy signal. “Should the loop keep going?” is a question about that signal: is the error trending down, flat, or blowing up, and is that trend real or just noise? max_iterations answers a different question entirely — “have we done this N times yet?” — which is why it stops good loops early and lets bad ones run to the cap.

This post is how LoopGain answers the real question. It’s an open-source (Apache-2.0) library, so none of this is a black box — but the math is worth understanding even if you never read the source.

One naming note, since they look alike: loop gain (lowercase) is the control-theory quantity this post is about; LoopGain (the library) is named after it. The metric is the interesting part — LoopGain just computes it and acts on it.

The one number you have to provide: the error signal

LoopGain doesn’t define correctness for you. Each iteration, you hand it a single number — the error signal — that says how wrong the current output is:

  • failing unit tests (count of failures),
  • schema violations (count),
  • an LLM-judge’s distance from the target,
  • retrieval miss count, lint errors, whatever “wrong” means for your loop.
state = lg.observe(errors, output)   # errors: a number, or a list whose length is the magnitude

Everything below operates on the sequence of those numbers. If you can’t produce an error signal, LoopGain has nothing to read — that’s the one hard requirement, and we’ll come back to it.

Loop gain: the per-step ratio, and why it’s not enough alone

The namesake quantity is loop gain, written AβA\beta, from the Barkhausen stability criterion in control engineering. It’s the factor by which the error changes from one iteration to the next:

Aβ=EnEn1A\beta = \frac{E_n}{E_{n-1}}

Aβ<1A\beta < 1 means the error shrank — the loop is improving. Aβ1A\beta \geq 1 means it held or grew — the loop is stuck or making things worse. That’s the whole intuition, and if loops were noiseless you could stop right there: watch AβA\beta, stop when it crosses 1.

Loops are not noiseless. A single EnEn1\frac{E_n}{E_{n-1}} ratio bounces around — one lucky iteration drops the error, the next recovers it, and the instantaneous ratio swings wildly even when the underlying trend is a clean decline. Act on one ratio and you’ll stop on noise. So LoopGain doesn’t act on one ratio. It reads the whole trajectory.

Step 1: work in log space

Barkhausen says En=AβEn1E_n = A\beta \cdot E_{n-1} — error decays (or grows) geometrically. Take the log of both sides and a geometric trend becomes a straight line:

logEn=logE0+nlog(Aβ)\log E_n = \log E_0 + n \log(A\beta)

So LoopGain transforms the error history to log10(E)\log_{10}(E). Now “is this loop converging?” becomes “does this line slope down?” — and a slope is something you can fit and test instead of eyeball.

Step 2: fit the trend, then test whether it’s real

LoopGain fits an ordinary least-squares line to log10(E)\log_{10}(E) versus iteration. The slope of that line is the geometric-average logAβ\log A\beta across the whole loop — a far more stable estimate than the last single ratio.

But a downward slope on five noisy points might be chance. So LoopGain runs a two-sided t-test on the slope and gets a p-value. A trend only counts as real when p<0.05p < 0.05 — the same significance bar you’d use anywhere else. This is the difference between “the error happened to drop” and “the error is significantly decreasing.” It’s pure stdlib — a closed-form OLS slope and a Student-t p-value, no SciPy, no model.

Step 3: measure the wobble

A loop can have a flat trend and still be useless — bouncing between two answers, never settling. LoopGain detrends the log-error (subtracts the fitted line) and takes the standard deviation of the residuals. High residual scatter with a flat slope is the signature of oscillation: lots of motion, no progress.

Five oscilloscope traces side by side — a flat line, a smooth decay, a plateau, a steady oscillation, and a diverging blow-up — one per loop state.

The decision: five named states

From three features — the cumulative reduction Ecurrent/EfirstE_{\text{current}}/E_{\text{first}}, the fitted slope and its significance, and the oscillation std — LoopGain classifies the loop’s current state. The rule, in plain form:

StateCondition (informally)Action
TARGET_METerror hit your target (e.g. zero failing tests)stop, keep the answer
FAST_CONVERGEerror collapsed ~10× from the start and still droppingcontinue, predict ETA
CONVERGINGslope significantly down (or a solid cumulative drop)continue
STALLINGprogress has flattened — no new low for a few stepsstop once it persists, keep best-so-far
OSCILLATINGhigh wobble around a flat trendstop, roll back
DIVERGINGslope significantly up, past a marginstop, roll back

The “stop, we’re done” case is TARGET_MET — a short-circuit that fires the moment your error signal hits its target, before any trajectory math runs. FAST_CONVERGE is the opposite of done: the error has dropped a lot and is still dropping, so the loop keeps going (and LoopGain predicts the ETA to target). A loop only stops on a healthy trajectory when it actually reaches the target; it stops early when the trajectory turns bad — OSCILLATING / DIVERGING — or goes flat (STALLING).

The thresholds aren’t tuned to make a demo look good — they’re pre-registered and derived from convention: a one-decade (90%) reduction is the textbook step-response settling criterion; p<0.05p < 0.05 is standard significance; the oscillation cutoff corresponds to roughly a ±2× ripple, an underdamped Q3Q \approx 3 response. You can override them, but the defaults come from control theory and statistics, not from fitting the benchmark.

Two things the trajectory buys you for free

Once you’re fitting a log-trend, two useful quantities fall out.

ETA. If the loop is converging toward a target error EtargetE_{\text{target}}, the iterations remaining is a closed-form Barkhausen prediction:

nremaining=log(Etarget/Ecurrent)log(Aβsmooth)n_{\text{remaining}} = \frac{\log(E_{\text{target}} / E_{\text{current}})}{\log(A\beta_{\text{smooth}})}

So LoopGain can tell you “about 3 more iterations” instead of just “still going.” (It returns nothing when the prediction is undefined — no target, or a non-converging gain — rather than guessing.)

Gain margin. Borrowed straight from control engineering: GM=1/max(Aβsmooth)\text{GM} = 1 / \max(A\beta_{\text{smooth}}). Greater than 1 means the loop never crossed into instability; the larger, the more headroom it had. It’s a one-number summary of how close the whole loop came to blowing up.

Best-so-far rollback

When the loop stops on OSCILLATING or DIVERGING, the last output is — by definition — not the best one; it’s the degraded one. So LoopGain keeps the output associated with the lowest error it ever saw and hands that back:

answer = lg.result.best_output   # the iteration that worked, not the last one

This is the part that turns “stop early to save money” into “stop early and return a better answer,” because on a diverging loop the last iteration is often worse than one you passed three steps ago.

Where this doesn’t work — honestly

The math has hard edges, and you should know them before you reach for it:

  • Single-pass agents get nothing. One model call, no revise step, no trajectory. There’s nothing to fit.
  • No error signal, no LoopGain. If you can’t put a number on how wrong an output is, none of the above runs. The quality of the control is bounded by the quality of your error signal.
  • It needs a few iterations. With one or two points the slope has no degrees of freedom to test — the significance machinery can’t engage, and the cap is doing the real work.
  • Gradual convergence is the rare trajectory in practice. In our 2,000-trial benchmark, the CONVERGING and OSCILLATING states fired on well under 1% of iterations — modern LLMs tend to one-shot or stall, not glide down over many steps or thrash between answers. The states LoopGain earns its keep on are the ones that quietly burn budget: STALLING (the loop pinned, going nowhere — 21% of iterations) and DIVERGING, caught before the loop walks past its best answer.

None of this is hidden in the library. It’s a few hundred lines of stdlib Python under Apache-2.0 — read the classifier, disagree with a threshold, open an issue. The point isn’t that loop gain is magic. It’s that “when is this loop done?” has an answer grounded in a century of control theory, and max_iterations isn’t it.


LoopGain is pip install loopgain, Apache-2.0. The math here lives in the loopgain.classifier module; the benchmark behind the “rare trajectory” claim is open.