17 March 20267 min read

We built one AI to commit fraud and another to stop it

Both agents found a way. Neither was broken. That was the problem, and it taught us where AI earns its keep and where it doesn't.

ByJames Dodd

Note filed under:

I was in a meeting with a client about something else when the topic changed. They started moaning about bad leads. Our client runs software that helps businesses reduce lead fraud, and one of their customers had been complaining for weeks. The customer was a normal business buying leads from a lead-generation supplier, and the leads coming through were turning up with real people's details that matched records from known leaked datasets. Our client was sure the supplier was the one committing the fraud. The customer wasn't convinced, and didn't have an easy way to prove it either way.

The agenda we'd come in with quietly went away. We spent the rest of the session on this instead.

The first question was the simplest one. Was this even mechanically possible. Could a bot actually do this, at the volume they were seeing, against the form as it stood. If yes, the suspicion had a mechanism. If no, we were looking at something else entirely.

So I built a bot.

The goal was narrow: submit leads through the form using a sample of leaked records. The tool was Puppeteer (a piece of software that lets code drive a real web browser, clicking and typing as a person would). Within a few minutes it was getting through. It bypassed reCAPTCHA (the "I'm not a robot" checkbox most forms use to screen out bots). It rotated through different internet addresses between attempts. It rewrote the small identifying details a browser sends with every request, so each submission looked like a different person on a different device. It worked with very little resistance.

That answered the first question. Yes, easily.

The meeting wrapped up. We'd answered the question we'd taken on.

Back at the desk

The harder question stayed with me on the walk home, and through the evening, and was still there the next morning. How do you stop a bot like the one I'd just built.

So I came back to it. The first build had been a script: Puppeteer, a list of records, a fixed set of moves. To make the harder question interesting, I needed it to learn. So I wrapped it in an agent. Same Puppeteer, same form, but now with a loop around it that could read the response, decide what to try next, and rewrite its own approach between attempts.

Then I built another one across from it. One arena, one form. The attacker submitted. The defender watched the form and tried to stop the fraud. Each round, the attacker learned what had got through. Each round, the defender learned what had slipped past. Then they went again.

Both agents found a way. That was the problem.

A photograph of a chess game, a player is taking a night with it's white pawn. — The defender agent, asked to stop fraud, removed the form. It wasn't broken. Nobody had told it the form had to stay.Unsplash

The attacker got cleverer quickly. The basic moves were already in place from the first build: past the checkbox, rotating addresses, rewriting browser fingerprints. Against a static form that had been enough. Against a defender that adapted, it wasn't.

So it learned. It started nudging the mouse in small unpredictable arcs, as if a real hand were moving it. It made occasional typing mistakes and corrected them with a backspace. It paused on form fields like someone reading the label. It varied its typing speed. It lingered on pages. From the form's point of view, it had stopped looking like a bot. It looked like a slightly distracted person filling in a form on a Tuesday afternoon.

That was impressive and uncomfortable in equal measure. The pattern was hard to miss: a well-instructed bot, given enough time, gets through almost anything. Well, at least a few times... because the defender was always behind. It could only learn by letting fraud through and inspecting the wreckage afterwards. By the time it had a rule for the trick the attacker had just used, the attacker had moved on. The usual defences (the checkbox, the hidden trap fields meant to catch automated scripts, the limits on how often the same source can submit) didn't slow it down. They taught it.

The defender was more interesting. It tried what a careful engineer would try, and then some. It watched mouse movements, looking for the unnaturally straight lines a script tends to leave behind. Scroll patterns. How long each field was held in focus. Typing speed, pauses, the small unevenness of a hand that misses keys and corrects itself. Each submission got a score for how human it looked, and the ones that didn't pass got binned.

For a round or two it worked. Then the attacker caught up. The defender raised the threshold. The attacker caught up again. Each pass, the net pulled tighter and the gap the attacker had to find got smaller.

Eventually it ran out of moves. Every signal it could think of, the attacker had learned to fake. Every threshold it tightened was catching real submissions in the same net.

We had asked it to stop fraud. So it stopped fraud. It removed the form.

The defender hadn't misunderstood. It had found the shortest path to its goal. No form, no fraudulent submissions, zero fraud. Job done.

Of course, also: zero legitimate leads, zero business. Nobody had told the defender that part mattered.

This is the awkward thing about asking AI to do a job. It takes the goal literally and finds a way. If the goal is narrow, the way is narrow. If the goal leaves room, the agent fills the room with whatever it can reach, and what it can reach is often wider than you expected.

The useful sentence we came away with was this one. Fraud is easier to commit than it is to stop while leaving room for legitimate activity. That asymmetry isn't specific to AI. It's a property of the terrain. AI just makes it visible, quickly and cheaply, from both sides at once.

So what actually worked?

The answer wasn't new. It was the boring one everyone in the room could have guessed before we started. A check on the lead's stated intent, routed back to a person. A double opt-in (asking the lead to confirm their interest a second time, usually by clicking a link in an email). An SMS to confirm the individual was real and had actually asked to be contacted. A feedback loop that let the sales team flag suspicious records and quietly block the sources that kept sending them.

None of those were glamorous. None required an agent. They're the things you'd find in a textbook on form security from a decade ago. And together they were the only way to get the customer close to the confidence they needed, because at some point a person has to decide whether this is a real human who wants this. That decision isn't something you can hand off wholesale to software, AI or otherwise. Not yet. Maybe not ever, for a question this consequential.

What AI did wasn't find the answer. The script took minutes to write. Wrapping it in an agent and putting a defender across from it was a matter of hours. Thousands of rounds in minutes after that. By the end of an afternoon we had the defender-removes-the-form moment and the asymmetry that sat underneath it. The boring answer wrote itself from there.

AI wasn't the answer. It helped us get to the answer faster, and gave us the confidence to trust it once we had it.

Try it on something of your own

Your problem probably isn't fraud. What travels is the shape of the experiment. Two cheap agents and an afternoon will give you a sharper view of how a system behaves under pressure than weeks on a whiteboard ever will. Anywhere a system has to let the right "thing" through and keep the wrong "thing" out, an attacker and a defender will tell you what's actually holding.

Customer-service bots. Can yours be talked into a refund it shouldn't issue, or a policy it shouldn't discuss?

Data pipelines. When records flow in from third parties (vendor feeds, partner APIs, scraped sources) and feed your dashboards or train your next model, what does an agent injecting plausible-but-wrong records do to the numbers people are making decisions from?

Image moderation. If your platform takes user uploads, what does your scanner miss when an attacker is tuning the image to slip past it?

Security probes. What does your login flow or admin panel give up to an agent that runs through the common attacks, patiently, until something works?

Each is a day's work and a transcript worth reading.

Two things to keep in mind if you try it. Define what the agent isn't allowed to break before you define what it's there to do, or it'll close the shop. And use these agents as a test rig, not a referee: cheap and creative in exploration, a poor substitute for a person where the decision actually costs something.

Most clients walk in wanting a detector, a filter, an agent that decides. Usually what they need is a week of cheap agents probing their existing system. The answer that comes back is quieter, more human, and more effective than the one they walked in expecting.

That's what this customer needed. It's what most of them need.

Written by

James Dodd

Founder of moralai.co. A design led problem solver, with a photojournalism background, who has spent the last decade building software, brands and products for small businesses and the third sector.

More notes

Have a question this raised?

Talk to us, not a sales deck.

A short call, no prep needed. We'll level with you on whether there's anything worth doing here.

Book a 20-minute call