← All articles

The rise of simulation testing: How AI is rewriting the rules of security

All of cybersecurity is just classification. That’s the problem.

In 1936, Alan Turing proved something profound: that there are limits to what computation can ever do. His now-famous Halting Problem shows that there’s no general algorithm that can determine whether an arbitrary program will eventually halt or run forever.

This wasn’t an abstract musing. It was a formal, mathematical boundary — a line you can’t cross. And in practice, it means you can’t write a perfect program that inspects any other program and always correctly predicts what it will do.

Fast forward to 2005. In a paper titled “Guns and Butter: Towards Formal Axioms of Input Validation,” Meredith L. Patterson and Robert J. Hansen took this theoretical boundary and nailed it to the front door of the software security church.

Their thesis:

Input validation — the cornerstone of software security — is also an instance of the Halting Problem.

No validator can be both complete and correct for all possible inputs. You simply can’t determine whether a given input will trigger safe or unsafe behavior in all contexts. The problem is undecidable. Turing said so.


Cybersecurity is still pretending this isn’t true

Despite this, most of cybersecurity continues to rely on classification: breaking the infinite world of input into finite, labeled buckets like “safe,” “dangerous,” “benign,” “malicious,” “normal,” “anomalous,” etc.

Everything from:

  • Web Application Firewalls (WAFs)
  • Endpoint Detection and Response (EDR)
  • SIEMs
  • Static and dynamic scanners
  • Machine learning anomaly detectors

…to even modern LLM-based defenses, all ultimately rely on trying to draw boxes around categories of behavior, content, or code.

But here’s the problem:

You are applying finite categories to an infinite input space.

Attackers don’t care about your categories. They don’t even need to know your exact rules. They just need to know how to live outside the edges — and they always will. Classifiers, no matter how good, will:

  • Miss things (false negatives)
  • Block things they shouldn’t (false positives)
  • And completely ignore categories they didn’t anticipate (unknown unknowns).

That’s not just a bug in the system — it’s a feature of computability itself.


More rules ≠ more safety

So what do defenders do?

They keep adding more tools and rules.

More regexes. More heuristics. More exception handling. More detections. More YAML. More code.

But there’s a problem with this too — one that nobody wants to talk about. Every new detection rule, parser, or logic branch adds attack surface.

Security detection logic is code. It can:

  • Be bypassed
  • Be misunderstood
  • Be buggy
  • Be exploited

In trying to close one loophole, defenders often introduce two more. And since very few teams audit their detection logic with the same rigor as application code, these bugs linger for years.

We know this. Complexity always introduces risk. But the security industry is addicted to the idea that more rules = more protection, even though the math and reality say otherwise.


Enter generative AI: The great equalizer (or destroyer)

Then along comes generative AI.

LLMs and generative models don’t classify; they simulate. They generate infinite variants. They explore possibility space.

This changes the game — not just for defenders, but especially for attackers.

With gen AI, an attacker can:

  • Automatically generate thousands of input variations to test your WAF or validation logic
  • Use context-aware transformations to retain intent while dodging detection
  • Simulate and probe your detection surface faster than any human could.

And here’s the kicker:

The cost of doing this is approaching zero.

What used to take a skilled red teamer weeks now takes a laptop and an API key. Generative AI is built to do exactly what your classifiers are worst at handling: infinite variation, subtle transformation, and fuzzing the boundaries of semantics.

Your growing library of detection logic? It’s now an oracle for attackers to game — with AI as their simulator.


Simulation, not classification

So where does that leave us?

The core idea is this:

Security based on classification is collapsing.

Security based on simulation is what comes next.

You can’t keep out attackers by trying to label everything.

You must model behavior. Understand intent. Simulate consequences.

This is a fundamental shift — from pattern-matching to prediction, from filtering to modelling. It requires:

  • Reasoning over input rather than matching strings
  • Dynamically simulating execution paths and intent
  • Recognizing that any static classification system is already obsolete the moment it’s deployed.

We knew this would happen — we just ignored it

Turing told us.
Patterson and Hansen warned us.
We chose to build cathedrals of detection logic on sand anyway.

And now, generative AI has arrived to blow the roof off.

This isn’t the end of cybersecurity. But it is the end of pretending that classification alone is enough.

It’s time to stop building bigger walls, and start learning to model the battlefield itself.


Application Security Testing must evolve too

And here’s the part that many still haven’t grappled with: Application Security Testing (AST) must change too.

For decades, AST has focused on deeply testing software for flaws and “logic shadows” — hidden behaviors and assumptions. But this approach assumed your adversary was human: clever, persistent, but ultimately limited in time and creativity.

That assumption is now wrong.

Your adversary is no longer a tired human tester — it’s a tireless machine that:

  • Generates inputs
  • Learns from classification cues
  • Adjusts to business logic changes
  • Navigates deception networks
  • Circumvents identity challenges
  • And adapts in milliseconds.

In short: bots attack you now. And they’re learning faster than you’re updating your defenses.

To survive this shift, testing must go beyond finding known patterns or misconfigurations. You must simulate intelligent adversaries, stress business logic flows, and model adaptive, learning behavior.

What’s needed is not more code coverage — it’s adversarial simulation that mirrors real machine-led attacks. The goal isn’t just to break things — it’s to understand how an adaptive machine would break you tomorrow.

And that begins by changing what you test.


Contact us

Get prepared — contact us today.