Lifecycle drift: how pre-AI compensation breaks

Part of QA in the Age of AI-Accelerated Development. Pre-requisite: With agents.

In the previous page, the two debts emerged wherever agents replace humans in code generation. The pre-AI compensation layer (docs, ADRs, tests, runbooks) was supposed to absorb that loss. When those artifacts themselves start being derived from agent-generated code, the layer fails in a specific way.

Pre-AI, comprehension and intent debt accumulated whenever people quit, transferred, or simply forgot. The industry response evolved over decades into a layered set of compensation mechanisms:

Product Requirements Documents (PRDs), feature specifications, BRDs, roadmaps, OKRs, strategy decks, customer research synthesis captured the why at the business and product level: what problem the product solves, what tradeoffs were accepted, what success looks like, which user segments matter.
Confluence / Notion / wiki pages, decision logs, meeting notes captured the why of operational and cross-team decisions: who decided what, when, on what evidence.
Architecture Decision Records (ADRs) captured the why behind significant architectural choices, persisting intent across team changes.
Code documentation and inline comments captured the how (and sometimes the why) in the same place as the code.
Automated tests captured behavioural intent in executable form: tests are specifications that fail when the system stops doing what was intended.
Onboarding material and runbooks captured operational knowledge for new joiners.

These mechanisms had a critical property in common: they were human-authored. The artifacts originated from people who had done the work of deciding, building, or operating the system, so each one was a production of human learning: intent, reasoning, operational know-how, whatever the author had come to understand by doing. When the code drifted from what humans knew, people could catch the drift by comparing against these artifacts. When new joiners read them, they received the team’s accumulated understanding first, and the code second.

In the AI era, this property is reversed. When AI-generated code becomes the source from which artifacts are subsequently generated, each artifact is a derivation of code surface, not a production of human learning. If the implementation is wrong, the documentation describes the wrong behaviour as intentional, the tests pass on the wrong behaviour, and the next iteration extends the wrong behaviour. People reading these artifacts cannot catch the drift, because the artifacts and the implementation are now derived from the same source and agree by construction. The compensation mechanism does not merely fail; it ratifies the drift.

We name this the generative ratification loop:

AI implements wrong → AI-generated tests pass on the wrong → AI-generated docs describe the wrong as intentional → next iteration builds on the wrong as if it were correct.

Each step ratifies the previous step’s errors because all steps share the same source. The signal that would normally catch the wrong implementation (a human reviewer noticing “this isn’t what we wanted”) is absent at every step, because there is no human-authored anchor of intent against which to compare.

The generative ratification loop sits in a broader research pattern: self-referential AI loops without external corrective signal compound errors. The most prominent instance is model collapse (Shumailov et al., 2024, Nature), where training LLMs on data generated by previous LLMs causes “irreversible defects in the resulting models, in which tails of the original content distribution disappear”, degrading the model’s ability to generate diverse high-quality output. Our loop is mechanistically different (error propagation through artifact-generation chains rather than distribution drift in training data), but the meta-pattern is the same: when AI feeds AI without a human-authored anchor of intent, quality degrades. The ratification loop is the artifact-generation member of this family.

The SlopCodeBench paper (Orlanski et al., 2026) provides empirical support for the longitudinal symptom of this loop: agent code measured across 93 iterative checkpoints grows monotonically more verbose (89.8% of trajectories) and structurally eroded (80% of trajectories), while human code on the same kind of work stays flat over time. Prompt interventions improve initial quality but do not halt degradation. The paper studies code-level degradation rather than the artifact-generation loop we name here, but the pattern (autonomous iteration produces compounding decay; human authorship does not) is consistent with the loop’s prediction.

If the lifecycle bridge is to be rebuilt in the AI era, the artifacts that span it cannot be derived from agent-generated code alone. They need an anchor of human-authored intent. Where that anchor lives, and what discipline produces it, is the design space the research question opens.

← With agents

Next: Research Question →