We built one AI to commit fraud and another to stop it
Both agents found a way. Neither was broken. That was the problem, and it taught us where AI earns its keep and where it doesn't.
ByJames Dodd
A client came to us recently with a question that didn't quite have a shape. They run a B2B lead-generation platform, and they'd been looking at their form submissions and finding something they couldn't explain. Real people's details were turning up in the sign-up, matching records from known leaked datasets. They didn't know if it was fraud. They didn't know how it was happening. They didn't know if a bot could even pull it off at the scale they were seeing, or whether they were looking at some other pattern they hadn't yet named.
They wanted three answers. Was this actually fraud. If it was, how. And if we could show how, could we also show how to catch it.
We didn't start by writing a detection system. We started by testing the hypothesis.
The first agent had a narrow job. We gave it a goal and two tools. The goal was to submit leads through the client's form. The tools were a sample of leaked records, and Puppeteer (a piece of software that lets code drive a real web browser, clicking and typing as a person would). If the agent could do the job, convincingly and at scale, then the client's suspicion had a mechanism. If it couldn't, we were looking at something else.
It could.
So we added a second agent. One arena, one form. The attacker's job was to submit leads. The defender watched the form, and its job was to stop the fraud. Each round, the attacker learned what had got through. Each round, the defender learned what had slipped past. Then they went again.
Both agents found a way. That was the problem.
The attacker got clever quickly. Early rounds were crude: blunt submissions that any halfway-decent bot filter would swat down. It lost, learned, came back.
Within a few rounds it had worked out how to get past reCAPTCHA (the "I'm not a robot" checkboxes most forms use to screen out bots). It started nudging the mouse in small unpredictable arcs, as if a real hand were moving it. It made occasional typing mistakes and corrected them with a backspace. It paused on form fields like someone reading the label. It varied its typing speed. It lingered on pages. It changed the internet address it was coming from between attempts. It rewrote the small identifying details a browser sends with every request, so each submission looked like a different person on a different device. From the form's point of view, it had stopped looking like a bot. It looked like a slightly distracted person filling in a form on a Tuesday afternoon.
That was impressive and uncomfortable in equal measure. It confirmed what the client already suspected: a motivated bot, given a little time, gets through your form. The usual defences (the checkbox, the hidden trap fields meant to catch automated scripts, the limits on how often the same source can submit) were all slower than the attacker's learning curve.
The defender was more interesting.
We asked it to stop fraud. So it stopped fraud. It removed the form.
The defender hadn't misunderstood. It had found the shortest path to its goal. No form, no fraudulent submissions, zero fraud. Job done.
Of course, also: zero legitimate leads, zero business. Nobody had told the defender that part mattered.
This is the awkward thing about asking AI to do a job. It takes the goal literally and finds a way. If the goal is narrow, the way is narrow. If the goal leaves room, the agent fills the room with whatever it can reach, and what it can reach is often wider than you expected.
The useful sentence we came away with was this one. Fraud is easier to commit than it is to stop while leaving room for legitimate activity. That asymmetry isn't specific to AI. It's a property of the terrain. AI just makes it visible, quickly and cheaply, from both sides at once.
So what actually worked?
The fix the client ended up putting in place was almost entirely human-shaped. A check on the lead's stated intent, routed back to a person. A double opt-in (asking the lead to confirm their interest a second time, usually by clicking a link in an email). An SMS to confirm the individual was real and had actually asked to be contacted. A feedback loop that let the sales team flag suspicious records and quietly block the sources that kept sending them.
None of those were glamorous. None required an agent. And together they were the only way to get the client close to the confidence they needed, because at some point a person has to decide whether this is a real human who wants this. That decision isn't something you can hand off wholesale to software, AI or otherwise. Not yet. Maybe not ever, for a question this consequential.
AI earned its keep in this job, somewhere quieter than the fight.
We built the first agent in a few hours and used it to answer a question the client couldn't answer from their logs alone: is this even mechanically possible at the scale we're seeing? Once we knew it was, we added the second agent and used the pair of them to stress-test what might catch it. Thousands of iterations in minutes. A report with clear routes forward in a day. That pace is what let us find the defender-removes-the-form moment, the asymmetry underneath it, and the human checks that actually worked.
Without it, this would have been a six-week project, and the client would have spent most of it guessing. We'd have drawn up, on a whiteboard, all the ways we thought fraud might be happening. We'd have argued about whether the checkbox was "good enough". We'd have proposed some rules for catching it and argued about those too. The client would have paid for a lot of talking and ended up roughly where we started, a quarter poorer.
Instead, we used agents as a test rig: something that let us probe the actual terrain (what gets through, what doesn't, what breaks when you try to stop it) in hours rather than weeks. Not the decision-maker, not the referee. The rig.
That's an underrated use of AI, and probably its most honest one.
Where else this shape of experiment works
The pattern here (one agent trying to break the rules, another trying to enforce them, iterating against each other) isn't specific to fraud. It's a method. It works wherever you have a system whose job is to let the right traffic through and keep the wrong traffic out, and you don't yet know how well it does either.
A few places we think it earns its keep:
Customer-service bots. Can yours be talked into issuing a refund it shouldn't, disclosing a policy it isn't meant to discuss, or escalating something it should handle? Build an agent whose job is to extract those behaviours. Run it against your bot for a thousand turns. Read the transcript.
Promotion and checkout logic. Can a persistent user stack discounts, combine codes that shouldn't combine, or talk your pricing engine into a number you didn't intend? Point an agent at the checkout and ask it to find the cheapest basket. You will learn things.
Moderation and review systems. If your platform takes user-submitted content (reviews, comments, listings), how much of the bad stuff your moderator is supposed to catch does it actually catch when the submitter is creative? A determined agent, one whose job is explicitly to get around the rules, will surface the weak edges of your moderation system in an afternoon.
Internal approvals. Expense claims, leave requests, purchase approvals. Wherever you have a rules-based gatekeeper, an agent can find the claims that pass when they shouldn't, and the claims that fail when they should pass. Both matter.
The experiment doesn't need to be big. A day's work from an agent can save a quarter's work from a team.
The two things worth taking away
Write down what the agent isn't allowed to break before you write down what it's there to do. Define the floor before you define the goal. Otherwise, when you ask an agent to stop fraud, don't be surprised when it closes the shop.
Use AI as a test rig before you use it as a referee. It's cheap, fast, and creatively destructive in exactly the ways you want during exploration. It's a poor substitute for a person at the point where the decision actually costs something.
Most of the AI conversations we have with clients start with the second one. They think they need a detector, a filter, an agent that decides. Often, what they actually need is a week of us pointing cheap agents at their existing system and writing down what falls out. The final answer is almost always quieter, more human, and more effective than the one they walked in expecting.
That's what this client needed. It's what most of them need.
Written by
James Dodd
Founder of moralai. Spent the last decade building software for people who don't describe themselves as technical.
Have a question this raised?
Talk to us, not a sales deck.
A short call, no prep needed. We'll level with you on whether there's anything worth doing here.