Free is never quite free
A free document tool we found inside a client had two problems. One was hallucination. The other was where the files went when you clicked the button.
ByJames Dodd
We've written elsewhere about a free document comparison tool a client had been using for a year. That piece was about the tool's accuracy. It was quietly wrong about a third of the time, and nobody had spotted it. That is one kind of problem. This piece is about the other.
The second problem with the tool had nothing to do with hallucination. A perfectly reliable version would have had the same issue. It came down to a question nobody had asked. Where does the file go when you click the button?
We found the answer the same afternoon we found the accuracy problem.
The tool was a web page. You pasted two documents in, you hit Compare, and a moment later the differences appeared on screen. What wasn't visible, anywhere on the page, was that your documents had been uploaded to a server in another country, processed by a company nobody at the client had ever heard of, and a copy had been kept.
The client had a written policy against that. Board pack. Onboarding deck. Signed by someone who meant it at the time. The kind of policy a regulator asks for, or a customer wants to see, or a board member cares about. Not the kind anyone thought needed checking on a Tuesday morning, when a document needed comparing.
Data sovereignty (the rule that says your data sits under the laws of the country you've chosen, and no other) sounds technical until you need it. The point is simple. The country where your data physically lives is the country whose laws apply to it. For some industries (health, finance, public sector, and the businesses that sell to them) the rule is not optional. A file on a server abroad is subject to the laws abroad. A file with an unknown vendor is subject to whichever jurisdiction they picked, which may be one they will not tell you about.
Our client was not in one of the regulated industries. They still cared, because their customers did, because their board did, and because once you have told the world you keep your data in a particular place, putting it anywhere else is a quiet breach of trust.
There was a quieter clause too, in the vendor's terms of service, where terms always are. It said, in the usual careful wording, that anything submitted to the tool could be used to train the vendor's model. That is standard for free services. It is also exactly what makes free services expensive for a company whose work has value.
The documents being pasted in were internal policies. Their sentences, over time, could surface as completions in somebody else's output. Not verbatim, usually. Close enough, sometimes.
You don't find out when your work leaks. You find out, if you find out at all, when a competitor's new document sounds strangely familiar.
Free tools are paid for in data. The only question is what with.
The replacement we built for this client (described in the companion piece) took care of the data problem incidentally. It ran inside the client's own infrastructure. Files never left. There was no third-party processor in the path, no terms of service training on inputs, no cross-border transfer of anything.
Those weren't features we added because the accuracy problem demanded them. They were the consequences of building the tool ourselves instead of pointing the client at a free web page. One piece of work. Two risks gone.
What to check before you paste
Three questions, slower than they sound, before you feed a document into a tool that showed up for free on the open internet.
Where is the file going. If the answer isn't obvious from the landing page, the file is somewhere you haven't checked.
Who is running the server. If the vendor's name is new to you, the contract you do not have with them will also be new to you when something goes wrong.
What the terms say about training. The clause is almost always there. The opt-out, almost never.
If any of those answers lands badly, the tool is not free. It has been paid for in the risk you took without noticing.
Written by
James Dodd
Founder of moralai. Spent the last decade building software for people who don't describe themselves as technical.
Have a question this raised?
Talk to us, not a sales deck.
A short call, no prep needed. We'll level with you on whether there's anything worth doing here.