<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>William Lubelski</title><description>Essays and writing by William Lubelski.</description><link>https://william.lubel.ski/</link><item><title>On Compaction</title><link>https://william.lubel.ski/writing/2026-03-10-on-compaction/</link><guid isPermaLink="true">https://william.lubel.ski/writing/2026-03-10-on-compaction/</guid><description>How quickly we forget</description><pubDate>Tue, 10 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;Forgetting&lt;/h2&gt;
&lt;p&gt;Forty minutes into a complex task, an agent starts repeating itself,&lt;br&gt;re-derives old conclusions, proposes an approach it already tried and abandoned.&lt;/p&gt;
&lt;p&gt;Old findings get compressed into a summary, and the summary lost the subtle
reasoning. The sharp edges got sanded off. A limit got reached, and so an
algorithm guessed at what was important to retain. The agent continues with a
lossy copy of its own thinking, but the work continues.&lt;/p&gt;
&lt;p&gt;As a user on the outside, sometimes we power through it, sometimes we throw our
hands up and start a new session. Until that session degrades too.&lt;/p&gt;
&lt;p&gt;The standard read on this: context windows will get bigger. Models will get
better at long-range attention. Compaction algorithms will get smarter. The
implicit assumption: the architecture is fine, the scarce resource is context.
Make that resource less scarce and the problem goes away.&lt;/p&gt;
&lt;p&gt;Maybe.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Context preservation techniques&lt;/h2&gt;
&lt;p&gt;The community continues to eveolve lots of clever techniques for managing context.&lt;/p&gt;
&lt;h3&gt;Sub-agents&lt;/h3&gt;
&lt;p&gt;Instead of burning the main agent&amp;#39;s context on every sub-task, you spin up a
smaller agent with its own context, let it do the work, and return a conclusion.
The main agent&amp;#39;s context burns slower because it only absorbs summaries, not the
full reasoning process.&lt;/p&gt;
&lt;p&gt;This helps. It&amp;#39;s a real improvement. But the main agent is still degrading. You
slowed the burn rate, but you didn&amp;#39;t change what&amp;#39;s burning.&lt;/p&gt;
&lt;h3&gt;The Checklist&lt;/h3&gt;
&lt;p&gt;The second fix is the checklist loop. Write the plan to a file. Give the agent
one task at a time. Reload context fresh for each task. Externalize everything
to disk so there&amp;#39;s nothing in the context window that needs to survive.&lt;/p&gt;
&lt;p&gt;This is genuinely good engineering. It treats the context window as volatile
scratch space — you keep emptying it, so rot can&amp;#39;t accumulate. It&amp;#39;s thermostatic
control: read the state, compare to the goal, take the next action, repeat. A
thermostat doesn&amp;#39;t understand heat transfer. It reads a number and flips a
switch. And for a surprising range of real work, that&amp;#39;s sufficient.&lt;/p&gt;
&lt;p&gt;But someone has to write the checklist. Hard engineering problems don&amp;#39;t arrive
pre-decomposed. When the work reveals that the plan was wrong — when you
discover mid-task that your decomposition was about the wrong thing — the
checklist can&amp;#39;t adapt. It can change what the agent does next. It can&amp;#39;t change
what the agent understands.&lt;/p&gt;
&lt;h3&gt;Preemptive compaction&lt;/h3&gt;
&lt;p&gt;As the agent approaches its context limit, have it write up its current state
and pick-up instructions. Reboots fresh and have this agent pick up where the
old one left off. More thoughtful than algorithmic compaction. More adaptive
than a fixed checklist.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;The necessary coordinator&lt;/h2&gt;
&lt;p&gt;Each of these is a genuine improvement on the one before. Sub-agents burn the
context slower. The checklist avoids burning it at all for simple tasks.
Preemptive compaction manages the burn more gracefully.&lt;/p&gt;
&lt;p&gt;Each still assume there is a prime agent, and that notion still holds that
agent&amp;#39;s context in special vaunted status. The necessary coordinator. Without
that coordination, the work cannot continue. And so everything else is a
strategy for keeping that agent alive and functioning as long as possible.&lt;/p&gt;
&lt;p&gt;The context window is finite, so you manage the finite resource.&lt;/p&gt;
&lt;p&gt;The prime agent is the thread of reasoning, so you protect the thread.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Fire, uncontained and contained&lt;/h2&gt;
&lt;p&gt;Fire burning in the open is useful. You can warm yourself by it, cook over it,
see by it. But it&amp;#39;s ambient. The heat goes everywhere. You manage it by tending
it. More fuel, less fuel, a ring of stones, a cleared area so the sparks don&amp;#39;t
catch. Every improvement is a better way of tending the same combustion.&lt;/p&gt;
&lt;p&gt;But fire sculpted by a mechanism becomes something else. It is, in one of the
oldest senses of the word, &amp;#39;an engine&amp;#39;.&lt;/p&gt;
&lt;p&gt;The heat goes where the machine directs it. The mechanism doesn&amp;#39;t make the fire
hotter or more efficient — it makes the fire&amp;#39;s output &lt;em&gt;structural&lt;/em&gt;. The fire
does the same thing it always did. The machine is what changed.&lt;/p&gt;
&lt;p&gt;In the early 1700s, Thomas Newcomen built one of the first machines to do this
with steam. His atmospheric engine pumped water out of coal mines by injecting
steam into a cylinder, then injecting cold water to condense it — the
condensation created a vacuum, the atmosphere pushed the piston down, water got
pumped. It worked for sixty years. But it was roughly 1% thermally efficient,
because the cold water cooled the cylinder on every stroke, and most of the fuel
went to reheating what had just been cooled. The mechanism that did the work
also destroyed the conditions for doing more work.&lt;/p&gt;
&lt;p&gt;In the 1760s, James Watt was repairing a model Newcomen engine at the University
of Glasgow. He wasn&amp;#39;t trying to build a new kind of engine. He was trying to
understand why the model used so much steam. And he noticed where the waste was
going: into reheating the cylinder.&lt;/p&gt;
&lt;p&gt;His fix was not &amp;quot;make a better cylinder.&amp;quot; It was: stop condensing in the
cylinder. Move the condensation to a separate vessel — a condenser — that stays
cold while the cylinder stays hot. Each component does one job, in the
conditions suited to that job. Efficiency roughly tripled. Not from a better
version of Newcomen&amp;#39;s engine, but from a different machine that happened to use
the same steam.&lt;/p&gt;
&lt;p&gt;Heating and condensation were fighting over the same vessel.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Coordination&lt;/h2&gt;
&lt;p&gt;Inference and memory are fighting over the same vessel.&lt;/p&gt;
&lt;p&gt;The context window is good at inference: hot, expensive, high-bandwidth, the
place where reasoning actually happens. It is bad at memory: finite, degrading,
lossy under compression. We keep trying to make it do both.&lt;/p&gt;
&lt;p&gt;Compaction is the cold water injection. It preserves a lossy version of the
agent&amp;#39;s state so inference can continue — but it degrades the context that makes
inference valuable. The agent spends tokens re-deriving conclusions,
re-orienting in territory it already mapped, reconstructing judgment from
summaries of judgment. Every compaction cycle means reheating a cooled cylinder.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Sub-agents coordinate multiple cylinders in parallel.&lt;/li&gt;
&lt;li&gt;Checklists coordinate multiple cylinders in series.&lt;/li&gt;
&lt;li&gt;Preemptive compaction more elegantly times the cooling.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;What would this look like?&lt;/h2&gt;
&lt;p&gt;If you stop trying to preserve the cylinder — if you accept that the context
window is scratch space, not storage — the architecture changes shape on its
own.&lt;/p&gt;
&lt;p&gt;There&amp;#39;s no prime agent. There&amp;#39;s no single thread of reasoning to protect.&lt;/p&gt;
&lt;p&gt;Instead there&amp;#39;s a back-and-forth:&lt;/p&gt;
&lt;p&gt;An assessment step reads the original prompt, reads the current state of the
work, reads the log of what&amp;#39;s been done. Makes a judgment: what needs to happen
next? Dispatches work.&lt;/p&gt;
&lt;p&gt;A work step receives a task, does the work, writes its findings somewhere
persistent, logs what it did, and yields its final notes to exit.&lt;/p&gt;
&lt;p&gt;It doesn&amp;#39;t need to explain its entire reasoning back to the assessor. That&amp;#39;s in
the report. The assessor can double check agent&amp;#39;s homework if it seems prudent,
otherwise it can proceed with more of the high level task.&lt;/p&gt;
&lt;p&gt;This next assessment &lt;em&gt;could&lt;/em&gt; be a resumption of the previous assessor&amp;#39;s context.
But it doesn&amp;#39;t have to be. It could be a fresh agent that reads the prompt, the
current working state, and the log as needed. Fresh context, full fidelity,
reading the current state of the world.&lt;/p&gt;
&lt;p&gt;Like a shift change on a navy boat:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You know the standing orders.&lt;/li&gt;
&lt;li&gt;You check the notes from the last shift.&lt;/li&gt;
&lt;li&gt;You check the state of the world.&lt;/li&gt;
&lt;li&gt;Then you get to planning what needs to be done next.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &amp;quot;thread of reasoning&amp;quot; isn&amp;#39;t in any agent&amp;#39;s head. It&amp;#39;s in the files.&lt;/p&gt;
&lt;p&gt;This is not recursive work burning down a single finite resource. It&amp;#39;s a
trampoline — a back and forth where each participant starts fresh and reads the
current state of the work. Context isn&amp;#39;t precious. It&amp;#39;s a fresh cylinder, heated
and ready. What&amp;#39;s scarce is something else entirely: good judgment about what to
do next. Discipline. And good judgment comes from clear state and full history,
not from a degrading memory of having been there.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;The agents are the steam&lt;/h2&gt;
&lt;p&gt;Watt didn&amp;#39;t just optimize the Newcomen design. But he also didn&amp;#39;t reinvent fire
or steam. He changed which parts did what. Which parts stay hot, which parts
stay cold, and where the work accumulates.&lt;/p&gt;
&lt;p&gt;Agents are the steam. You don&amp;#39;t waste them, you choose the right kind for the
job. But you don&amp;#39;t design the whole machine around keeping one batch of steam
alive. You design the machine so the steam can do its work and be replaced by
fresh steam, and the work persists in the parts that were built to hold it.&lt;/p&gt;
&lt;p&gt;I don&amp;#39;t know that any of this is right. I&amp;#39;m messing around with a Claude Max subscription
just like lots of other people. I am not an AI researcher and I don&amp;#39;t have
benchmarks or even a working proof of concept yet.&lt;/p&gt;
&lt;p&gt;It&amp;#39;s a premise, a feeling even. It&amp;#39;s a shape I keep noticing where the current
approaches all seem to be optimizing within a design that might have the seam in
the wrong place.&lt;/p&gt;
&lt;p&gt;But maybe the move now is the same as it was in 1765.&lt;/p&gt;
&lt;p&gt;Separate the condenser.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Appendix&lt;/h2&gt;
&lt;details&gt;

&lt;summary&gt;A Condensor Shape&lt;/summary&gt;

&lt;br/&gt;

&lt;p&gt;My hunch about what the condenser looks like:&lt;/p&gt;
&lt;p&gt;[A] partitioned shared working state&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Partitioned meaning multiple agents can access and work on different parts without needing to read in the entire corpos&lt;/li&gt;
&lt;li&gt;Shared meaning any agent can write to any portion (efficiently shareing is policy, not mechanism)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;[B] a queryable append-only log of what happened&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;An immutable log lets the agents and the humans see if someone isn&amp;#39;t playing well with others&lt;/li&gt;
&lt;li&gt;Queryable means that if a new finding challanges an old assumption, old work can be reexamined at full fidelity (like a a detective reading cold case files)&lt;/li&gt;
&lt;/ul&gt;
&lt;/details&gt;

&lt;br/&gt;</content:encoded></item><item><title>Addressable Agents</title><link>https://william.lubel.ski/writing/2026-03-02-addressable-agents/</link><guid isPermaLink="true">https://william.lubel.ski/writing/2026-03-02-addressable-agents/</guid><description>What will it take for agents to feel like coworkers and not power tools?</description><pubDate>Mon, 02 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;OpenClaw had a moment and other companies are releasing similar features (Notion scheduled agents, Perplexity computer use, etc).&lt;/p&gt;
&lt;p&gt;But they&amp;#39;re all missing the same thing (it&amp;#39;s very McLuhan).&lt;/p&gt;
&lt;p&gt;They&amp;#39;re all trying to invent &amp;quot;the next computer&amp;quot; — they want AI to be an interface paradigm shift like terminal → GUI, desktop → laptop → mobile. (After mobile, VR/AR/XR was the theoretical successor, with middling to bad results.)&lt;/p&gt;
&lt;p&gt;It feels like the thing that&amp;#39;s going to make AI actually useful is giving it an &lt;em&gt;addressable identity&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;And maybe, at least to start, that just means &amp;quot;an email account&amp;quot;.&lt;/p&gt;
&lt;p&gt;The truly useful agent can&amp;#39;t just be in one app — it has to be in any app. It can&amp;#39;t just act &lt;em&gt;as&lt;/em&gt; me, it has to &lt;em&gt;be&lt;/em&gt; something else.&lt;/p&gt;
&lt;p&gt;In every traditional UI shift, it&amp;#39;s still me doing the action, using a new surface.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I sit at my desktop and buy a plane ticket.&lt;/li&gt;
&lt;li&gt;Then I sit on my couch and buy a plane ticket.&lt;/li&gt;
&lt;li&gt;Then I&amp;#39;m on the train and I buy a plane ticket.&lt;/li&gt;
&lt;li&gt;Now everyone is obsessed with... my browser magically buys a plane ticket for me?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;No. &amp;quot;My agent buys a plane ticket &lt;em&gt;on behalf of&lt;/em&gt; me&amp;quot;. Say it out loud to yourself. This is a 40 year old solved problem. When a human travel agent in 1995 buys a plane ticket for me, they do so not by impersonating my voice. They do so &amp;quot;on behalf&amp;quot; of me. The system knows the provenance of the fact that they did this. If I call up and buy a ticket myself, the final result is the same, but the records reflect the difference.&lt;/p&gt;
&lt;p&gt;Every one of those shifts gave &lt;em&gt;us&lt;/em&gt; a better tool (and specifically, expanded &lt;em&gt;where&lt;/em&gt; we could use them). A laptop is a portable desktop. A phone is a pocket laptop. Each one widens the surface over which we can coordinate whatever it is we need to get done.&lt;/p&gt;
&lt;p&gt;Each of these were &lt;strong&gt;power tools&lt;/strong&gt;. The gas chainsaw was powerful. Then the electric hedge clipper let us garden more often and with less fuss. We didn&amp;#39;t throw away the chainsaw — we just had more tools in the shed. Each new paradigm is additive: it captures some use cases and opens new ones, but the old surface sticks around. But what if the next shift isn&amp;#39;t another tool for the shed?&lt;/p&gt;
&lt;h2&gt;The Jetsons&amp;#39; Robot Gardener&lt;/h2&gt;
&lt;p&gt;We don&amp;#39;t interact with it in specific new AI channels. We just use the existing plumbing for how we coordinate with human actors. The difference isn&amp;#39;t just capability, it&amp;#39;s also &lt;em&gt;relationship&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Notion and Perplexity and everyone can&amp;#39;t crack the nut of the next big thing in their apps because the next big thing isn&amp;#39;t going to be in &lt;em&gt;one&lt;/em&gt; app. OpenClaw scratched the surface of this but got sidetracked with signal as a &amp;quot;control scheme&amp;quot;.&lt;/p&gt;
&lt;p&gt;These are all going to need one thing: an email address. Or, more generally, a &lt;strong&gt;&lt;em&gt;unique addressable identity&lt;/em&gt;&lt;/strong&gt;. Once we give these things stable addressable identities, I think the floodgates are going to rip open.  &lt;/p&gt;
&lt;p&gt;(Email addresses aren&amp;#39;t cool on their own.  The agent doesn&amp;#39;t even need write access to the email account.  It exists to enable accepting the invite email from Linear, GitHub, Slack, etc: to participate in the systems humans already use, without those systems needing to be rearchitected as &amp;quot;AI Native&amp;quot;)&lt;/p&gt;
&lt;p&gt;Your &amp;quot;coworker,&amp;quot; your &amp;quot;assistant,&amp;quot; whatever its scope and mandate — it&amp;#39;s an email address, a set of memory files, an event bus (message received → run prompt), and a cron job (every 10 min, run prompt — usually go right back to sleep).&lt;/p&gt;
&lt;p&gt;That&amp;#39;s it. That&amp;#39;s the baseline: &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;[A] Persistent unique addressable identity (&amp;quot;an email address&amp;quot;)&lt;/li&gt;
&lt;li&gt;[B] Thoughts can span interactions (memory files)&lt;/li&gt;
&lt;li&gt;[C] Can respond to your queries (event bus)&lt;/li&gt;
&lt;li&gt;[D] Can take actions proactively (&amp;quot;cron job&amp;quot;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Every current AI product gets some but not all of these (and executes each with varying quality):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Siri, Alexa, etc: C, only, nothing else&lt;/li&gt;
&lt;li&gt;Claude Code: B, C. a little D. no A&lt;/li&gt;
&lt;li&gt;Notion Agents: D, C, kind of B, no A.&lt;/li&gt;
&lt;li&gt;OpenClaw: B, C, D, flickers of A&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is textbook Clay Christensen disruption. Painfully so. Apple, Google, etc. desperately want AI to be a feature that fits into their existing platforms.&lt;/p&gt;
&lt;p&gt;They&amp;#39;re all trying to put the power of Lt. Commander Data into a text input, but none of them wants to ship Lt. Commander Data. (Claude &lt;em&gt;coworker&lt;/em&gt;?  Come on, it&amp;#39;s right there)&lt;/p&gt;
&lt;p&gt;(Now obviously on the show, Data can stand in the ready room and give his status report. Today that part is a robotics problem. But almost any meeting in the ready room on TNG could have been a space-Zoom call, or honestly just a space-email. The point is that &lt;em&gt;the agent can converse, task, and be tasked — same as any other participant.&lt;/em&gt;)&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;A Clay Christensen style analysis&lt;/h2&gt;
&lt;p&gt;Disruption theory says incumbents fail not because they&amp;#39;re stupid but because they&amp;#39;re &lt;strong&gt;rational&lt;/strong&gt;. They listen to their best customers, invest in sustaining innovations, and rationally ignore the low-end or new-market footholds where disruptors start. The disruption happens when the disruptor improves along a trajectory that eventually meets mainstream needs — at which point the incumbent can&amp;#39;t respond.&lt;/p&gt;
&lt;p&gt;Map the ABCD framework onto the competitive landscape:&lt;/p&gt;
&lt;h3&gt;Google&lt;/h3&gt;
&lt;p&gt;Best structural position of any incumbent. They already give identities to everything (Workspace accounts, service accounts). They already have the event bus (Pub/Sub, Cloud Functions). They already have the cron (Cloud Scheduler). Gmail is literally THE identity layer of the internet for a billion people. A Google AI agent with its own &lt;code&gt;agent-for-bob@workspace.google.com&lt;/code&gt; that can send email, read calendar, book meetings, file expenses — they have every piece.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Why they might blow it:&lt;/em&gt; Co-option. Google&amp;#39;s business model is ads, and ads require &lt;em&gt;you&lt;/em&gt; looking at screens. An agent that acts on your behalf means fewer eyeballs. Every incentive pushes toward making the agent a feature of Gmail/Docs/Search (sustaining innovation) rather than an independent entity that reduces screen time. They can&amp;#39;t rationally cannibalize their own attention economy.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Microsoft/OpenAI&lt;/h3&gt;
&lt;p&gt;Second best structural position. Azure AD already has identity and delegation primitives. Exchange has had delegate mailboxes for 25 years. They understand &amp;quot;on behalf of.&amp;quot;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Why they might blow it:&lt;/em&gt; Same co-option, different flavor. &amp;quot;Copilot in Teams,&amp;quot; &amp;quot;Copilot in Excel,&amp;quot; &amp;quot;Copilot in Word.&amp;quot; The agent transcends any single product; Microsoft wants it trapped in M365. Their enterprise customers are &lt;em&gt;asking&lt;/em&gt; for &amp;quot;Copilot in my app&amp;quot; — and Christensen says listening to your best customers is exactly how you miss the disruption.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Apple&lt;/h3&gt;
&lt;p&gt;They have the identity (Apple ID), the device graph, and the most intimate user relationship. They could give an agent an iCloud email address tomorrow.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Why they&amp;#39;ll almost certainly blow it:&lt;/em&gt; Apple&amp;#39;s entire philosophy is that the human is in control and nothing happens without explicit user action. An autonomous agent with its own identity that sends emails you didn&amp;#39;t individually approve violates Apple&amp;#39;s DNA. It&amp;#39;s not a technical limitation, it&amp;#39;s a &lt;em&gt;constitutional&lt;/em&gt; one. And their best customers — privacy-conscious consumers — are explicitly asking them NOT to do this.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Anthropic/Claude&lt;/h3&gt;
&lt;p&gt;B and C, a little D, no A. Claude Code is the closest thing to the &amp;quot;coworker&amp;quot; framing, but it has no persistent identity in the world. It can&amp;#39;t receive an email. It can&amp;#39;t be addressed by other systems. Each session is an amnesiac (modulo memory files, which are B but fragile B).&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;Where Anthropic could win:&lt;/em&gt; They&amp;#39;re the least encumbered by a business model that requires eyeballs or lock-in. No ad business (Google), no enterprise suite (Microsoft), no hardware-control philosophy (Apple). Their business model is API calls, and an agent with an email address that autonomously interacts with the world makes &lt;em&gt;more&lt;/em&gt; API calls, not fewer.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;Why they might blow it:&lt;/em&gt; Safety culture could become resistance. &amp;quot;We can&amp;#39;t give the agent an email address because it might send something harmful&amp;quot; could calcify into a constitutional objection to A and D. Also, they don&amp;#39;t own any of the surfaces — they&amp;#39;d need partnerships with the incumbents who have incentives to block them.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Newcomers / Disruptors&lt;/h3&gt;
&lt;p&gt;The over-served market: knowledge workers who want AI to help them be more productive &lt;strong&gt;in their existing tools&lt;/strong&gt;. Every incumbent is fighting over this.&lt;/p&gt;
&lt;p&gt;The underserved market: small businesses and solo operators who need &lt;em&gt;delegation&lt;/em&gt;, not assistance. A freelancer doesn&amp;#39;t want &amp;quot;Copilot in Excel.&amp;quot; They want someone to handle their invoicing. A 5-person startup doesn&amp;#39;t want &amp;quot;AI in Notion.&amp;quot; They want a back-office person who doesn&amp;#39;t exist yet because they can&amp;#39;t afford one. That&amp;#39;s a &lt;strong&gt;delegation&lt;/strong&gt; relationship, not a &lt;strong&gt;tool&lt;/strong&gt; relationship.&lt;/p&gt;
&lt;p&gt;The disruptor is probably someone currently building a &amp;quot;toy&amp;quot; that incumbents dismiss. It&amp;#39;ll look like &amp;quot;an email address connected to an LLM with a cron job&amp;quot; and the first reaction from Google/Microsoft will be &amp;quot;that&amp;#39;s cute, but it doesn&amp;#39;t have enterprise security features.&amp;quot; By the time it does, it&amp;#39;ll be too late.&lt;/p&gt;
&lt;p&gt;Interestingly, customer-facing agent companies (Intercom, Zendesk) are already building agents with their own identities — email address, memory, actions. They just haven&amp;#39;t generalized beyond customer support. If one of them realizes they&amp;#39;ve already built ABCD for one domain and generalizes it...&lt;/p&gt;
&lt;p&gt;The dark horse: someone who builds the &amp;quot;agent identity provider.&amp;quot; Not the agent itself, but the identity layer. The way Okta/Auth0 became the identity layer for SaaS apps, someone could become the identity layer for AI agents. Issue the agent an identity, manage its permissions, handle the &amp;quot;on behalf of&amp;quot; delegation chain. Every agent builder would use them because building identity is a distraction from building the agent.&lt;/p&gt;
&lt;p&gt;The winners will be whoever recognizes that the set of things worth doing has expanded, rather than doing the old things but fancier:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Google, Microsoft, Apple will make agents a feature of their existing products. They&amp;#39;ll bolt an agent onto &amp;quot;user interacts with our app.&amp;quot; This is using the word processor to grade the same test.&lt;/li&gt;
&lt;li&gt;Anthropic has the best shot among established AI companies, but only if they make a bet that feels irresponsible by current safety norms — giving agents persistent identity and autonomy.&lt;/li&gt;
&lt;li&gt;The actual winner probably starts with the identity primitive, not the intelligence primitive. Everyone is competing on who has the smartest model. The disruption will come from whoever figures out that &lt;em&gt;intelligence is becoming commoditized&lt;/em&gt; and the scarce resource is the identity and delegation infrastructure that lets intelligence act in the world.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Not &amp;quot;smarter AI.&amp;quot; Not &amp;quot;AI in every app.&amp;quot; Addressable AI entities with provenance.&lt;/p&gt;
</content:encoded></item><item><title>All Bets Are Off</title><link>https://william.lubel.ski/writing/2026-02-03-all-bets-are-off/</link><guid isPermaLink="true">https://william.lubel.ski/writing/2026-02-03-all-bets-are-off/</guid><description>Code got real cheap. What does good software still cost?</description><pubDate>Tue, 03 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;details&gt;
&lt;summary&gt;Outline&lt;/summary&gt;

&lt;br/&gt;

&lt;p&gt;&lt;strong&gt;§1&lt;/strong&gt; — Zero marginal cost of production.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The cost of production is trending towards zero. Old measurements were bets; the odds moved.
They&amp;#39;ve decoupled from what they tracked. So what now?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;§2&lt;/strong&gt; — Coherence.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Therefore:&lt;/em&gt; Coherence is the new scarce resource. The value of externalizing it went up; the cost
went down. Now we can write it all down.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;§3&lt;/strong&gt; — Leverage.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;But:&lt;/em&gt; Documentation is the foundation. What possibilities does that open up? Automating taste,
the divergent-convergent loop, frequency vs. amplitude. The process is recursive: a ratchet.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;§4&lt;/strong&gt; — &amp;quot;Too impractical.&amp;quot;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Therefore:&lt;/em&gt; Here&amp;#39;s what amplitude looks like in practice. Not faster — more thorough. The things
that were never worth doing are now worth doing. These gains compound. Your competitors have the
same tools.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;§5&lt;/strong&gt; — Path dependence.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;But:&lt;/em&gt; Most orgs will get this wrong. Resistance, co-option, thrash: three ways of refusing to
place new bets.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;§6&lt;/strong&gt; — New bets.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Therefore:&lt;/em&gt; The old bets were real, and now they&amp;#39;re off. Choose wisely.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/details&gt;

&lt;hr&gt;
&lt;h2&gt;Zero marginal cost of production (§1)&lt;/h2&gt;
&lt;p&gt;The naive read of AI coding tools is that we&amp;#39;ll take what we used to do in a month and now we&amp;#39;ll do
it in two weeks, or eventually a week, and eventually a day. Maybe, but the implicit assumption here
is typing speed was already your limiting factor.&lt;/p&gt;
&lt;p&gt;But AI isn&amp;#39;t changing every aspect of a business in the same way and the same amount. Most of your
business still runs at the speed of business. (Regulatory and compliance still run at the speed of
government.)&lt;/p&gt;
&lt;h3&gt;What the numbers used to mean&lt;/h3&gt;
&lt;p&gt;A codebase with a million lines used to be worth something. Not because a million lines is
inherently valuable, but because someone had to write them. Who would have spent all that time if
the thing didn&amp;#39;t work? A million lines was evidence of a million decisions.&lt;/p&gt;
&lt;p&gt;The million-line codebase is the most dramatic example, but the same thing happened everywhere:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lines of code used to mean effort.&lt;/li&gt;
&lt;li&gt;Coverage used to mean diligence.&lt;/li&gt;
&lt;li&gt;Velocity used to mean capacity.&lt;/li&gt;
&lt;li&gt;A 5K-line PR used to mean something had gone wrong.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;None of these mean the opposite now. A codebase with high coverage might still reflect genuine
diligence. A team with steady velocity might still be well-coordinated. But you can&amp;#39;t tell by
looking at the number anymore. The number doesn&amp;#39;t confirm and it doesn&amp;#39;t deny. It just stopped being
evidence.&lt;/p&gt;
&lt;p&gt;If we haven&amp;#39;t already, we&amp;#39;ll see some series A &amp;quot;scam&amp;quot; acquisitions where a flashy startup gets
acquired and once the purchaser does due diligence, they find that most of the repo is the
scrawlings of a madman. A million lines, generated in weeks, signifying nothing. Worth less than
nothing.&lt;/p&gt;
&lt;h3&gt;Why they all broke at the same time&lt;/h3&gt;
&lt;p&gt;Every one of those was an indirect measure — a bet that the thing you could measure would track the
thing you couldn&amp;#39;t. Lines of code tracked effort. Coverage tracked diligence. Velocity tracked
capacity. These were never the real thing. They were stand-ins.&lt;/p&gt;
&lt;p&gt;They worked because you couldn&amp;#39;t hit the number without doing the work. Writing a thousand lines of
coherent code required understanding the problem. Achieving 85% coverage required thinking about
edge cases. Shipping consistently required genuine team coordination.&lt;/p&gt;
&lt;p&gt;The indirect measures and the real things were linked by production cost. The cost was the
authenticator.&lt;/p&gt;
&lt;p&gt;When production cost drops, every indirect measure authenticated by that cost breaks at the same
time. Not because anyone is gaming the system — in a healthy org, nobody is. But the numbers that
used to require the underlying work no longer do. You can hit every metric on the dashboard and have
done none of the thinking.&lt;/p&gt;
&lt;p&gt;&amp;quot;LOC tells us something useful&amp;quot; was a bet. &amp;quot;Coverage means the code is solid&amp;quot; was a bet. The cost
structure made those safe bets. The cost structure has changed.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Coherence (§2)&lt;/h2&gt;
&lt;p&gt;The goal was and remains: a high-quality product you can efficiently maintain and change over time.&lt;/p&gt;
&lt;p&gt;So what correlates with that now?&lt;/p&gt;
&lt;p&gt;Coherence.&lt;/p&gt;
&lt;p&gt;By coherence I mean the structural property that makes the next change obvious. Not easy,
necessarily — but obvious. You look at the existing patterns and you know where the new code goes,
what it should be called, how it should behave.&lt;/p&gt;
&lt;p&gt;That property exists at every level of the system. At the top it&amp;#39;s an architecture that maps cleanly
to the business domain. In the middle it&amp;#39;s consistent patterns — one way to handle errors, one way
to structure a service. At the bottom it&amp;#39;s naming conventions and file structure that don&amp;#39;t make you
guess.&lt;/p&gt;
&lt;p&gt;If &lt;em&gt;code&lt;/em&gt; trends to zero marginal cost, then well-defined &lt;em&gt;features&lt;/em&gt; start to trend to zero marginal
cost as well. Coherence is what makes a feature well-defined. We&amp;#39;ll come back to why that matters.&lt;/p&gt;
&lt;h3&gt;Incoherence for humans&lt;/h3&gt;
&lt;p&gt;When humans do all the work, coherence lives in two places: the artifacts and the people. The code,
the docs, the tests — and then everything the team just knows.&lt;/p&gt;
&lt;p&gt;That second category is bigger than most teams realize. It&amp;#39;s not just &amp;quot;who understands the billing
service.&amp;quot; It&amp;#39;s the shared scar tissue. &amp;quot;We tried event sourcing in payments and it was a nightmare,
so we use simple CRUD everywhere now.&amp;quot; Nobody documented that as an architectural decision record.
It&amp;#39;s just something the right people know, and they steer new work away from it instinctively. You
don&amp;#39;t document flinches. The space of things you decided not to do is infinite.&lt;/p&gt;
&lt;p&gt;Externalizing all of this — writing it down, keeping it current, making sure it reaches every
engineer who needs it — was a real cost that competed with building. Teams made rational tradeoffs
about how much to externalize.&lt;/p&gt;
&lt;h3&gt;Incoherence for machines&lt;/h3&gt;
&lt;p&gt;LLMs do not have 1:1s with your coworkers. LLMs do not even have memory. Their long-term memory is
the artifacts.&lt;/p&gt;
&lt;p&gt;So now two tradeoffs have fundamentally changed:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the &lt;em&gt;value from&lt;/em&gt; encoding coherent business thinking into the artifacts goes up.&lt;/li&gt;
&lt;li&gt;the &lt;em&gt;cost of&lt;/em&gt; encoding coherent business thinking into the artifacts goes down.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Take the event sourcing example. When the humans wrote all the code, the three people who remembered
the payments disaster would steer new work away from it. An LLM has no scar tissue. If nothing in
the codebase or the docs says &amp;quot;we don&amp;#39;t do event sourcing in payments,&amp;quot; the LLM will cheerfully
propose it — and generate a clean, well-structured, completely institutionally incorrect
implementation. The value of having that decision written down went from &amp;quot;nice to have&amp;quot; to &amp;quot;the
difference between useful output and output you throw away.&amp;quot;&lt;/p&gt;
&lt;p&gt;And the cost of writing it down dropped. The same tool that can&amp;#39;t intuit the flinch can help you
externalize it. Point the LLM at the payments service, tell it the history, and ask for an
architecture decision record. Five minutes. The document that nobody was going to spend an afternoon
writing now costs almost nothing to produce.&lt;/p&gt;
&lt;p&gt;Now we can write it all down. It&amp;#39;s cheaper than it&amp;#39;s ever been, and it matters more than it ever
has.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Leverage (§3)&lt;/h2&gt;
&lt;p&gt;Documentation is the foundation. What possibilities does that open up?&lt;/p&gt;
&lt;h3&gt;Automating taste&lt;/h3&gt;
&lt;p&gt;Once coherence is externalized, the next question is whether you can measure it. Nobody has a
coherence score for a codebase today. But we&amp;#39;re close.&lt;/p&gt;
&lt;p&gt;Static analysis was the first generation of automated judgment. It could measure what&amp;#39;s mechanically
computable, like cyclomatic complexity. It was a blunt instrument, but it was an honest attempt to
automate taste.&lt;/p&gt;
&lt;p&gt;The next generation is already visible: an LLM running on every CI pipeline, assessing the fuzzier
qualities that previously required a senior engineer&amp;#39;s eye. Does this PR introduce a new pattern
where an existing one would do? Is the naming consistent with the rest of the module? How far has
the actual code drifted from the documented architecture?&lt;/p&gt;
&lt;p&gt;These assessments get scored per-PR, trended over time, and used as guardrails. The coherence score
becomes correctness infrastructure.&lt;/p&gt;
&lt;h3&gt;Frequency vs Amplitude&lt;/h3&gt;
&lt;p&gt;In systems design there&amp;#39;s a pattern called divergent-convergent thinking.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Divergent thinking is spitballing. &amp;quot;No bad ideas.&amp;quot;&lt;/li&gt;
&lt;li&gt;Convergent thinking is analysis and verification. &amp;quot;Doing the homework.&amp;quot;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The design process is framed as a repeating pattern of divergent → convergent → divergent →
convergent.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;   /----------\      /----------\      /----------\
  /            \    /            \    /            \
-*              *--*              *--*              *--&amp;gt;
  \            /    \            /    \            /
   \----------/      \----------/      \----------/
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Often visualized in a diamond shape of widening scope and then winnowing to the practical, then
repeating.&lt;/p&gt;
&lt;p&gt;LLMs are comically good spitballers. But they&amp;#39;re mediocre verifiers, almost by definition. Coherence
and correctness infrastructure are investments in guiding the divergent phase and implementing the
convergent phase.&lt;/p&gt;
&lt;p&gt;One option is to try to leverage LLMs to run this process at a higher frequency. But doing the same
thing you were doing before, but faster... that&amp;#39;s only so interesting, and kind of exhausting.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;   /\  /\  /\  /\  /\  /\  /\  /\
  /  \/  \/  \/  \/  \/  \/  \/  \
──                                ──&amp;gt;
  \  /\  /\  /\  /\  /\  /\  /\  /
   \/  \/  \/  \/  \/  \/  \/  \/
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The first thing we all do with a new tool is replicate what we were already doing. It&amp;#39;s not wrong, it&amp;#39;s the natural first step. But the task is to not confuse that for all the other unforeseeable new options that will open up.&lt;/p&gt;
&lt;p&gt;But frequency isn&amp;#39;t the only dial.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Amplitude&lt;/em&gt;: Could we do drastically more in one cycle than we used to, because doing so
in the old world would have been cost prohibitive or downright impossible?&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;         /----\           /----\           /----\
        /      \         /      \         /      \
       /        \       /        \       /        \
      /          \     /          \     /          \
     |            |   |            |   |            |
     |            |   |            |   |            |
     |            |   |            |   |            |
     |            |   |            |   |            |
    /              \ /              \ /              \
---*                *                *                *---&amp;gt;
    \              / \              / \              /
     |            |   |            |   |            |
     |            |   |            |   |            |
     |            |   |            |   |            |
     |            |   |            |   |            |
      \          /     \          /     \          /
       \        /       \        /       \        /
        \      /         \      /         \      /
         \----/           \----/           \----/
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Wider divergent phase: more options explored per cycle, more approaches prototyped, more ideas
tested against reality before committing to one. Wider convergent phase: more thorough verification,
denser correctness infrastructure, the kind of rigor that was always valuable but never budgeted
for.&lt;/p&gt;
&lt;p&gt;And here&amp;#39;s the thing that makes this more than a one-time trick: the process is recursive. The
machine helps you build the convergent infrastructure — the tests, the lint rules, the architecture
docs — and that infrastructure constrains and improves the machine&amp;#39;s next round of divergent output.
Better generation means better infrastructure gets built on top of it. The widening isn&amp;#39;t a single
gesture. It componds.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Don&amp;#39;t do what you were doing before, but faster. Do the things that were always valuable but
never justifiable under the old cost assumptions.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The engineers and teams that get the most out of this shift won&amp;#39;t be the ones shipping the same
roadmap at higher velocity. They&amp;#39;ll be the ones who recognize that the entire set of things &amp;quot;worth
doing&amp;quot; has expanded, and are systematically exploiting the new tradeoffs.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;&amp;quot;Too impractical&amp;quot; (§4)&lt;/h2&gt;
&lt;p&gt;&amp;quot;That refactor isn&amp;#39;t worth it right now.&amp;quot; &amp;quot;Good enough for v1.&amp;quot; &amp;quot;Nobody&amp;#39;s going to write 200 test
cases for that edge case.&amp;quot;&lt;/p&gt;
&lt;p&gt;Every engineering team has a version of these sentences. Here&amp;#39;s what happens when they stop being
true.&lt;/p&gt;
&lt;h3&gt;Not faster. More thorough.&lt;/h3&gt;
&lt;p&gt;I needed to build mock APIs with realistic backing data. In the old world, an engineer spends a day,
writes maybe 2000 lines, covers the happy path and a few known edge cases. That&amp;#39;s a 90% solution,
and everyone agrees it&amp;#39;s good enough, because going further means another day of tedious
hand-written data and there&amp;#39;s other work to do.&lt;/p&gt;
&lt;p&gt;With an LLM, the 90% solution takes an hour. But the other bottlenecks haven&amp;#39;t moved. Code review
still takes the time it takes. Integration still takes the time it takes. Alignment with the team
still takes the time it takes. So raw speed on the implementation isn&amp;#39;t the constraint worth
optimizing.&lt;/p&gt;
&lt;p&gt;The real move is to spend the same half-day you would have spent before, but instead of a 90% mock,
you produce a mock with comprehensive test scenarios, realistic edge cases, failure modes, varied
data shapes — the kind of thoroughness that nobody would have budgeted for previously. Not 90%
faster. 500% more thorough in the same time envelope.&lt;/p&gt;
&lt;p&gt;And that thoroughness isn&amp;#39;t just nice to have. That mock data becomes correctness infrastructure.
Every feature built on top of it now has a richer, more realistic environment to be tested against.&lt;/p&gt;
&lt;h3&gt;Exploration collapses into proof&lt;/h3&gt;
&lt;p&gt;Exploration used to have to be budgeted and managed in steps. A proposal, a time allocation, a
research spike, a partial implementation, a review — a multi-week process before anyone sees
concrete results.&lt;/p&gt;
&lt;p&gt;Instead, a proposal doc can have an attached reference PR with a likely full working implementation.&lt;/p&gt;
&lt;p&gt;If the proposal gets refined based on feedback... regenerate the PR. The exploration and the proof
collapse into one artifact.&lt;/p&gt;
&lt;p&gt;This workflow has no analogue in the old world. It&amp;#39;s not a faster version of the old process. It&amp;#39;s a
different process.&lt;/p&gt;
&lt;p&gt;The old world separated &amp;quot;should we do this?&amp;quot; from &amp;quot;can we do this?&amp;quot; because answering the second
question was expensive. &lt;em&gt;When it&amp;#39;s cheap, you just answer both at once.&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;Let&amp;#39;s get weird&lt;/h3&gt;
&lt;p&gt;Those examples are conservative. Where could the &amp;quot;too impractical&amp;quot; calculus go from there?&lt;/p&gt;
&lt;p&gt;Some things that wouldn&amp;#39;t have survived a planning conversation six months ago:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Self-healing CLAUDE.md.&lt;/strong&gt; Use the LLM to write a CI job that uses the LLM to analyze a PR for
divergences between the new code and existing CLAUDE.md files. When a PR changes a pattern that
contradicts a CLAUDE.md, generate a proposed CLAUDE.md update and a proposed code revert. Let the
reviewer pick: did we change the convention, or did we violate it? Forces the decision to be
explicit either way.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Convention extraction from code review comments.&lt;/strong&gt; Mine your team&amp;#39;s PR review history for
recurring feedback patterns. &amp;quot;We always ask people to use the error wrapper.&amp;quot; &amp;quot;We always flag
direct database access outside the repository layer.&amp;quot; Generate lint rules from the things humans
keep repeating. Your reviewers have been writing a spec for years — it&amp;#39;s just trapped in GitHub
comments.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Invariant mining.&lt;/strong&gt; Point the LLM at your test suite and ask it to infer implicit invariants —
things that are true across every test but never stated as a rule. Then generate lint rules or
property tests that enforce them explicitly. The tests knew something the codebase didn&amp;#39;t say out
loud.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Test generation from prod incidents.&lt;/strong&gt; When a bug hits production, have the LLM write a
regression test, but also have it scan for structurally similar code paths and generate
speculative tests for those too. The incident becomes a pattern detector, not just a point fix.
Every bug you find makes the next bug harder to ship.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;PR-to-PR pattern drift.&lt;/strong&gt; Track the patterns introduced across the last N merged PRs. Flag when
the same problem is being solved three different ways across three PRs by three people (or three
LLM sessions). Nobody sees drift in real time. An LLM reading across PRs can.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Architecture doc staleness detector.&lt;/strong&gt; LLM reads the actual code, reads the architecture docs,
flags divergence. &amp;quot;The docs say payments uses REST, but there are three gRPC endpoints now.&amp;quot;
Reverse the usual flow — instead of updating docs from decisions, update docs from reality.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mutation testing on steroids.&lt;/strong&gt; Have the LLM generate semantically meaningful mutations — not
random bit flips, but plausible mistakes an LLM might actually make. &amp;quot;What if someone used
optimistic locking here instead of pessimistic?&amp;quot; If the test suite doesn&amp;#39;t catch it, that&amp;#39;s a real
gap, not a synthetic one.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Dependency impact simulation.&lt;/strong&gt; Before upgrading a dependency, have the LLM read the changelog
and your usage of the library, then generate a set of &amp;quot;things that might break&amp;quot; as test cases. Run
them before you upgrade. Turn the changelog into a pre-flight checklist.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These are all practically-wise, non-starters under the old cost structure. Not because they were bad
ideas — because the implementation hours dwarfed the payoff.&lt;/p&gt;
&lt;p&gt;But implementing any of these isn&amp;#39;t an independent win. Each one would produce infrastructure that
the others consume. The system&amp;#39;s fabric gets stronger with every piece you add.&lt;/p&gt;
&lt;p&gt;These gains compound. Each piece of infrastructure improves the next rounds of generation and
verification, which means the next piece of infrastructure lands better too.&lt;/p&gt;
&lt;p&gt;Our competitors have the same tools. The question is whether they&amp;#39;re investing in this coherence
infrastructure or if they&amp;#39;re just trying to turn the crank faster.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Path dependence (§5)&lt;/h2&gt;
&lt;h3&gt;The honest response&lt;/h3&gt;
&lt;p&gt;Everything in §1-§4 requires changing how you work. A natural response to that, when you&amp;#39;ve spent
years getting good at the old way, is: no. That&amp;#39;s not irrational. It&amp;#39;s protective.&lt;/p&gt;
&lt;p&gt;When React arrived in 2014, it violated every established best practice in frontend engineering.
Some developers called it a fad. They were wrong, but their skepticism wasn&amp;#39;t stupid — it was
calibrated to a world where those best practices had been genuinely correct. What changed wasn&amp;#39;t the
quality of their judgment. What changed were the constraints their judgment was calibrated to.&lt;/p&gt;
&lt;p&gt;That&amp;#39;s resistance. It says: this threatens something real, and I&amp;#39;m not ready to let go of it.&lt;/p&gt;
&lt;h3&gt;The worse failure mode&lt;/h3&gt;
&lt;p&gt;There&amp;#39;s another response, and it&amp;#39;s more dangerous. Call it co-option. This is where you technically
adopt the new thing but use it to preserve every existing structure. Same org chart, same process,
same estimation methods, same job descriptions — now with a subscription.&lt;/p&gt;
&lt;p&gt;You can already see it. Jira integrations that auto-generate status updates. Sprint retrospective
summarizers. AI-powered ticket estimation. Same work, same structure, same assumptions — now with a
chatbot bolted on. And co-option is self-reinforcing: the tooling creates jobs, the jobs create
advocates, the advocates entrench the tooling. This will happen at massive scale.&lt;/p&gt;
&lt;h3&gt;The worst failure mode&lt;/h3&gt;
&lt;p&gt;There&amp;#39;s a third response, and it&amp;#39;s the most destructive. Call it thrash. This is where someone fully
embraces the new tool, points it at everything, and generates at full speed with no spec, no
architecture, no convergence infrastructure — just output. PRs pile up. Code ships. Activity is
visible on every dashboard. And the codebase gets worse on every merge, because volume without
direction isn&amp;#39;t progress. It&amp;#39;s the politician&amp;#39;s syllogism applied to engineering: AI is
transformative; I am using AI; therefore I am transforming.&lt;/p&gt;
&lt;p&gt;Resistance preserves the old structure by refusing the new tool. Co-option preserves the old
structure by absorbing the new tool. Thrash destroys the old structure and replaces it with nothing.
All three end up in the same place: no coherence, no compounding, no infrastructure that makes the
next cycle better. These are three ways of refusing to place new bets.&lt;/p&gt;
&lt;h3&gt;Banning the word calculator vs. using GPT to grade the same old assignment&lt;/h3&gt;
&lt;p&gt;An education parallel captures the first two failure modes cleanly. Banning GPT essays is resistance
— honest, protective, ultimately a losing move because the word calculator isn&amp;#39;t going away. Using
GPT to auto-grade the same five-paragraph essays is co-option. Technically it&amp;#39;s &amp;quot;adopting AI&amp;quot;, but
preserving the exact measurement that stopped measuring what it was supposed to measure.&lt;/p&gt;
&lt;p&gt;The hand written essay was an indirect measure of critical thinking. If the measure is dead, the right move is
neither banning the tool nor automating the old measure. It&amp;#39;s raising the bar: teaching critical thinking using the tools that students will encounter in the world today.&lt;/p&gt;
&lt;h3&gt;The distinction&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Resistance is at least honest about the stakes.&lt;/li&gt;
&lt;li&gt;Co-option pretends the stakes don&amp;#39;t exist.&lt;/li&gt;
&lt;li&gt;Thrash pretends the work is the stakes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of the three, co-option and thrash are harder to fight, because both look like progress.&lt;/p&gt;
&lt;p&gt;All three are cultural problems wearing technical clothes.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Resistance is an identity problem.&lt;/li&gt;
&lt;li&gt;Co-option is a bureaucratic self-preservation problem.&lt;/li&gt;
&lt;li&gt;Thrash is a leadership problem.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The technical prescription — coherence and correctness infrastructure, compounding — is necessary but not sufficient. The organizational self-reflection required to actually adopt it is a different essay.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;New bets (§6)&lt;/h2&gt;
&lt;p&gt;Everyone views the new thing in the lens of the old. It&amp;#39;s the only lens we have to start. The
question is whether you&amp;#39;re going to get stuck there, or whether you can start to acquire new lenses.&lt;/p&gt;
&lt;p&gt;The engineers who called React a fad weren&amp;#39;t wrong about their craft. The teachers banning AI essays
aren&amp;#39;t wrong about critical thinking.&lt;/p&gt;
&lt;p&gt;Every heuristic, every indirect measure, every definition of &amp;quot;worth doing&amp;quot; was a bet placed against
a specific cost structure. Lines of code measured effort because effort was expensive. &amp;quot;Good enough
for v1&amp;quot; was rational because thoroughness cost more than it saved. Estimation worked because
implementation was the bottleneck. These &lt;em&gt;were&lt;/em&gt; all good bets. They paid off for years.&lt;/p&gt;
&lt;p&gt;But the cost structure is being rewritten, and now they might not be.&lt;/p&gt;
&lt;p&gt;The indirect measures decoupled from what they measured. The set of things worth doing expanded past
what the old calculus can see. And the most dangerous response isn&amp;#39;t refusing to adapt — it&amp;#39;s
adopting the new tools to preserve the old assumptions.&lt;/p&gt;
&lt;p&gt;All bets are off. New table. No limit. Choose wisely.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Appendix&lt;/h2&gt;
&lt;details&gt;
&lt;summary&gt;Related thoughts&lt;/summary&gt;

&lt;h3&gt;Craftsmanship&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/The_Woodwright%27s_Shop&quot;&gt;The Woodwright&amp;#39;s Shop&lt;/a&gt; (1979-2017)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/The_New_Yankee_Workshop&quot;&gt;The New Yankee Workshop&lt;/a&gt; (1989-2009)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We have all been writing code for 50 years like Roy Underhill. Those who decide to retain their
artisanal path but adopt the new tools will go the way of Norm Abram. Those who choose to follow the
path of scale will have to learn some patterns that may feel a lot like the advent of the modern
factory. It&amp;#39;s bittersweet to see the twilight of the golden age.&lt;/p&gt;
&lt;h3&gt;Sea Change&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=fij_ixfjiZE&amp;t=329s&quot;&gt;Margin Call&lt;/a&gt; (2011)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When a sea change comes, it is not obvious to most until the time for meaningful action has long passed.&lt;/p&gt;
&lt;/details&gt;

&lt;br/&gt;</content:encoded></item></channel></rss>