Field Notes

When AI-built software breaks.

AI-built software does not break randomly. It breaks in a small number of recurring shapes. The code an AI writes is fluent, and fluency reads like correctness without being it, so each failure hides until something forces it into the open. These are the shapes we keep finding, in our own code and in the projects people hand us to fix. Each one is a note: a real bug, the investigation, and the public commit that closed it.

The shapes it breaks in

01
The silent break
It passes every test and is still wrong.
The code does what the test imagined, not what the user needs. If the test drives a mock entry point and checks that the parts exist rather than that the right thing happened, it stays green while the product is broken. Green is a fact about your test, not your software.
Read the full note
02
The confident misconfiguration
It runs, so it looks configured. It runs wrong.
Setup code that reads as correct and quietly defeats itself: a worker that never actually threads, a cache that never actually clears. It does not error, so nobody looks twice. The expensive bugs are the ones that run.
Read the full note
03
The wrong frame
Fluent reasoning, aimed at the wrong problem.
When the mental model underneath is wrong, more AI reasoning does not help. It produces more convincing wrong answers, faster, fixing symptom after symptom of a cause it never names. This is the most dangerous property of AI coding: it is persuasive in the wrong direction.
A full note is coming.
04
The debt at typing speed
AI writes code at the speed of typing, and debt at the same speed.
The thousand-line file, the dead code regenerated on every prompt, the abstraction nobody chose. It ships fast and reads fine until someone has to change it. The bill comes later, and it is paid in refactors.
A full note is coming.
05
The hole in the boring place
The risk hides where people stop reviewing.
Not in the auth flow everyone scrutinizes, but in the toast, the loader, the helper that just shows a message. That is exactly where an unescaped string or an unsafe default survives, because it looked too small to check.
A full note is coming.

The notes

Why does AI-generated code work in the demo but break in production?

The demo is the path the AI optimized for. AI-written code and its AI-written tests both describe the happy path, so they agree with each other and can be wrong in the same direction. The break lives in the path nobody exercised: the real entry point, the second file, the input the demo never sent.

What are the most common ways AI-built software fails?

A small set of recurring shapes: silent breaks that pass every test, confident misconfigurations that run but run wrong, wrong-frame bugs where fluent reasoning fixes the wrong problem, debt from code written at typing speed, and security holes hidden in the boring helper functions nobody reviews.

Is AI-generated code reliable?

It is fluent, which is not the same as reliable. It reliably produces something that looks right and usually runs. Whether it does the right thing depends on whether anyone tested the real path and read the parts that just work. Reliability is a property you add by reviewing, not one the fluency hands you.

How is fixing AI-generated code different from ordinary debugging?

The failures are quieter. There is usually no crash and no red test. The skill is knowing which green lights to distrust, and recognizing the recurring shapes, instead of chasing each symptom as though it were new.

When AI-built software breaks.

The shapes it breaks in

The silent break

The confident misconfiguration

The wrong frame

The debt at typing speed

The hole in the boring place

The notes

The tool that passed every test and was quietly broken

Your tool isn't slow. It's misconfigured.

Why does AI-generated code work in the demo but break in production?

What are the most common ways AI-built software fails?

Is AI-generated code reliable?

How is fixing AI-generated code different from ordinary debugging?