Field Note · the wrong frame

Three crashes, three patches, one cause none of them named.

Three separate crashes got three separate guards. Every guard held. None of them was the bug. Here is the commit that finally named it.

Ojan Lubis·2026-07-02

Builds production software, and cleans up the AI-generated code that breaks it. pdflokal, his open-source PDF toolkit, is one of the repos these notes come from.

A crash report comes in. You open the stack trace, you see exactly where it blew up, you add a guard so it cannot blow up there again. The crash stops. Real users stop hitting it. That is a good day, and it feels like progress, because it is. We had three of those days in a week.

Over about eight days, three separate crashes came in from real users, all in the same corner of pdflokal, the part that manages pages. One was a selected annotation that outlived its annotation after a page was removed, an iOS tap crash. A guard shipped (d970560). Then the annotations for the selected page came back undefined after a page changed, and a thumbnail render crashed walking a page whose state had drifted. Two more guards (5bee38d). Three crashes, three stack traces, three guards, and every one of them held.

Three crashes is not three bugs

Three crashes in a week, in the same corner, all about pages and the things attached to pages. That is not three bugs. That is one bug wearing three stack traces, and the guards had made each stack trace safe without ever touching the bug.

Here is the thing they were all standing on. The editor kept several maps all keyed by page index: the annotations on each page, the render caches, the zoom scales, plus the selected page (an index) and the selected annotation (an index pair). Every one of them assumed that page number two meant the same page everywhere. So every time a code path changed the list of pages, a reorder or a delete, it had to re-key all of those maps in lockstep. Miss one, and page two in that map now pointed at a different page than page two in the real list. The state had drifted, and the next read of it crashed, or worse, quietly showed the wrong page.

All three are instances of the same bug class.The author, in the commit that finally named it, after the third guard. Each patch had caught one place the drift surfaced. None had touched the drift.

The fix was not a fourth guard

The reframe (0441856) was a single helper, mutatePages. It snapshots the page-keyed maps, captures the selected page and annotation as object references rather than indices, lets the caller change the page list, then re-keys every map by finding each page’s new position by reference. One place that re-keys everything at once, so page number two can never again mean two different pages. The disease was “parallel state keyed by index.” The cure was to stop trusting the index and trust the object.

And the three guards stayed. The commit kept them on purpose, belt and suspenders: the helper prevents the drift, the guards catch anything that ever bypasses the helper. That is the honest part, and it is the whole point. The guards were never wrong. They held, they stopped real crashes for real people. They were just one floor below the cause, catching the symptom on its way down.

What this actually teaches

This is the most dangerous shape AI-built code takes, because the wrong frame is comfortable. “Each crash is its own bug” is a frame that lets you make progress. You fix one, you fix the next, the crash graph goes down, and it feels like the work is working. It works well enough to survive, which is exactly why it survives. The only thing that exposes it is stepping back and asking a question no stack trace asks: why here, again? What produced all three?

This story indicts AI twice. First the structure. A pile of maps all keyed by index is a hallmark of AI-grown code: each time a feature needs some per-page data, the model adds another map keyed the same way, because that matches the last one, and it never proposes collapsing them into a single source of truth. The maps accrete, and every one is a fresh chance to drift.

Then the fix. Paste an AI a stack trace and it will add a guard for that exact trace, fluently, with a reasonable comment, and stop. It patches the symptom because the symptom is what you showed it. It does not step back and name the cause, because naming the cause means questioning the shape of the whole thing, and fluent reasoning aimed at the wrong frame just produces more convincing guards, faster. You can whack the moles for a long time before anything makes you ask where they come from.

So when the same area crashes three times, stop fixing crashes. The repetition is the signal. Three symptoms at one address is almost never three bugs. It is one structure generating them, and the fix is not another guard. It is naming the thing the guards were catching.

The receipts, public

Band-aid one: a guard for the iOS tap crash, JS-4 (2026-06-01)d970560
Band-aids two and three: guards for JS-7 and JS-8 (2026-06-04)5bee38d
The reframe: mutatePages(), one place that re-keys every map by reference (2026-06-09)0441856

Why do I keep patching the same crash?

Usually because the patches are at the wrong layer. If several crashes share one root cause, a guard for each will stop its own stack trace and none will fix the cause, so the next crash of that family is only a matter of time. The fix is to name the shared cause, not to guard each trace.

What is state drift?

When two pieces of state that are supposed to agree fall out of sync. In our case several maps were all keyed by page index; any code path that changed the list of pages but forgot to re-key one map left that map pointing at the wrong page, and the next read crashed or showed the wrong thing.

Why does AI-generated code keep breaking in the same place?

Because AI patches symptoms. Paste it a stack trace and it adds a guard for that exact trace, confidently and plausibly, then stops. It rarely steps back to ask what structure produced the crash, so the structure stays and produces the next one. More reasoning aimed at the wrong frame just generates more convincing wrong fixes, faster.