|2 min read

Your AI bill isn't the real bill

When an AI silently starts missing things, the invoice is the last place you'll notice. The cost lands somewhere else.

ByJames Dodd

A client asked us last year to help convert a large archive of documents, hundreds of pages each, from one format to another. The obvious approach was to hand the whole lot to a model and ask it for the converted version. We tried it. It looked like it was working.

The trouble showed up on the longer documents, and on the awkward ones (odd layouts, tricky tables, long appendices). On those, the output would quietly drop chunks of the source, or compress several pages into a line or two. No error. No warning. You only caught it by opening a file and reading what came back.

When you feed a model more than it can comfortably hold in mind, the quality of its answers starts to slip. It misses bits. It contradicts itself. It hedges where it shouldn't. People who work with these tools have taken to calling this context rot: the quiet decay of output quality as the input grows.

The name is well chosen. Rust builds up wherever metal sits exposed to moisture and air, hidden under paint, eating into the surface long before anything looks wrong. You can prevent it with care. Once it has set in, you don't get the metal back: you cut out what it has eaten and replace it.

Context rot behaves the same way. It accumulates with exposure (long sessions, heavy inputs, prompts that have grown over time). You can prevent it with the same kind of care. But by the time it shows up in the output, the damage has already been done.

For a business, the interesting question is what that damage costs. Not the bill from the AI provider, which barely notices. The places that don't show up on an invoice at all.

The intuitive response is to cut and optimise. Smaller prompt, cheaper model, shorter conversation, lower bill. We tried that first. It made things worse.

The same documents had to be re-run two or three times before the output was usable. Each retry cost money. It also cost time nobody had budgeted for. Somebody on the client side was still hand-fixing the gaps after every pass, which was real hours from a real person who should have been doing something else. And because the failures were silent, the work ran on hit-and-hope: you'd ship a batch, then find out later whether the documents in it had actually come through whole.

That was the real bill. The charge from the AI provider was the smallest line on it.

So we did the opposite. We broke each document into smaller pieces, ran them one at a time, and stitched the results back together. The per-run cost went up, because we were doing more processing per file. But there were no retries. The output was clean enough that nobody had to fix it afterwards. The client could forecast the cost of processing a new document within a narrow range, because the variance had collapsed.

Predictable beat cheap, by a wide margin.

The bill is the last place this kind of decay shows up. You see it first in the retries, and in the hours your people spend cleaning output that should have been right the first time. If your AI spend is moving and the invoice doesn't explain it, look there first. By the time it shows on the bill, the rust has been at work for weeks.

Written by

James Dodd

Founder of moralai. Spent the last decade building software for people who don't describe themselves as technical.

Have a question this raised?

Talk to us, not a sales deck.

A short call, no prep needed. We'll level with you on whether there's anything worth doing here.