The Correlation Problem: A JMeter Story

It’s 4pm on a Tuesday. The performance test is scheduled for 7am tomorrow. You’ve spent the afternoon building out the JMeter script from scratch — HTTP requests are all in, thread groups configured, think times look sensible. You fire off a quick smoke test to make sure the basics work before you call it a day.

Everything 404s.

You stare at the error log. Every request after the login step is failing. Not just failing — throwing 401 Unauthorized and 403 Forbidden across the board. The application is rejecting your test users as if they’ve never authenticated at all.

You already know what this means. Somewhere in that first response, there’s a token. A CSRF token, a session ID, a nonce — whatever the app is using to tie requests together. You’re not capturing it. You’re not sending it. And the server, doing exactly what it’s supposed to do, is telling you where to go.

Welcome to correlation.

Why Correlation Is Still Hard

Correlation is one of those concepts that’s simple to explain and genuinely painful to do at scale. The idea is straightforward: some values returned by the server need to be captured and replayed in subsequent requests. Session identifiers, anti-CSRF tokens, OAuth state parameters, order IDs — anything that’s generated dynamically server-side and expected back in the next call.

JMeter has always handled this with extractors. You add a Regular Expression Extractor or Boundary Extractor to a sampler, write a pattern to capture the value, store it in a variable, and reference ${myVariable} wherever you need it.

In theory: clean and easy.

In practice: you’re chasing tokens through 60 requests across three pages of a modern web application, each one potentially containing a different dynamic value that you need to find, identify, pattern-match, and wire up correctly. One missed token and the whole journey falls apart.

The Manual Approach

The traditional workflow goes something like this.

You record a session using the JMeter proxy recorder or export a HAR from Chrome DevTools. You replay it. It fails. You open the failing request, look at the body or headers being sent, and ask yourself: “Where does this value come from?”

Sometimes it’s obvious. A response body contains "csrfToken": "abc123def456" and you know exactly what you’re looking for. You add a JSON Path Extractor, grab the value, reference it downstream. Done.

More often it’s not obvious. The dynamic value is somewhere in a 40KB HTML page mixed with a dozen other candidate strings. Or it’s buried in a redirect chain. Or it appears as a hidden form field. Or it’s a JWT you need to partially decode to understand. Or it’s not in the response at all — it’s being computed client-side from values that are.

The Boundary Extractor is useful here. If you know the text that appears immediately before and after a value, you can capture it without writing regex:

Setting	Value
Left Boundary	`"_token":"`
Right Boundary	`"`
Variable Name	`csrf_token`

It works. But finding those boundaries still requires manual inspection, and modern applications don’t always make it obvious. A single-page application might be loading tokens from a JSON API response, setting them in JavaScript state, and injecting them into outgoing requests via an interceptor — none of which is visible in a standard proxy recording.

And then there’s the OAuth flow. Or PKCE. Or GraphQL where the mutation response includes a cursor ID you need to thread through the next three calls. These aren’t edge cases anymore — they’re the standard architecture of the applications we’re testing.

What Correlation Actually Costs

Let me put some numbers on it.

For a moderate-complexity web application — say, an e-commerce checkout with a login, product browse, basket, and order confirmation — you might be looking at 15 to 25 dynamic values to correlate. Session ID, CSRF token, basket token, order reference, maybe a few internal IDs, possibly an OAuth token depending on the auth setup.

Working through these manually: call it 15 to 20 minutes per correlation if you’re experienced. More if the value is obscure, if you’re working with an unfamiliar application, or if you’re having to decode tokens to understand what you’re looking at.

That’s three to five hours of mechanical work before you’ve run a single meaningful test. On a complex enterprise application with a longer journey, it’s easily a full day.

And that’s assuming you find everything the first time. Usually you don’t. You run the test, find a failure 10 steps in, trace back to the missing correlation, fix it, and run again. Repeat.

The Specific Things That Catch You Out

After years of doing this, a few scenarios come up repeatedly.

Multi-value captures. The application returns an array of IDs — product IDs, item references, whatever — and you need to iterate through them in subsequent requests. JMeter’s -1 match count on the extractor handles this, generating numbered variables (myVar_1, myVar_2, etc.) alongside a count variable. Getting the downstream parameterisation right is fiddly.

Tokens that change on every page. Anti-CSRF tokens in particular. They’re issued per-request, and they expire. If you’ve hardcoded an extractor to only capture the token from the login response, but the application rotates it on every POST, you’ll sail through the first few steps and then start failing silently on the ones you haven’t covered.

OAuth and PKCE flows. The OAuth 2.0 authorization code flow involves at minimum: the authorization code from the redirect, the state parameter, and then the access and refresh tokens from the token endpoint. PKCE adds a code verifier/challenge pair on top. Each of these needs to be captured, stored, and replayed correctly. One missing piece and the entire auth chain fails.

Enterprise applications. If you’re testing Salesforce, SAP, Oracle, or ServiceNow, there are platform-specific tokens and identifiers with their own naming conventions and extraction patterns. These aren’t the tokens you’re used to seeing in general web applications, and recognising them for what they are requires knowing what to look for.

Automating the Tedious Part

This is the problem I built Perf AutoCorrelator to solve.

The idea is simple: take a HAR recording from Chrome DevTools (or an existing JMX, or a k6/Gatling script), run it through an automated correlation engine, and get back a set of extractors ready to merge into your test script. The mechanical work — finding the values, writing the patterns, wiring up the variables — happens in seconds rather than hours.

The engine scans recordings against 50+ detection patterns across eight categories:

Standard patterns — sessionId, csrfToken, userId, orderId, and the usual suspects
OAuth 2.0/OIDC — authorization codes, access tokens, refresh tokens, PKCE verifiers and challenges, state parameters
GraphQL — mutation IDs, cursors, subscription IDs
Enterprise — Salesforce session tokens, SAP CSRF, Oracle ADF state, ServiceNow tokens
WebSocket — connection IDs, channel IDs, socket.io SIDs
AI heuristic — UUIDs, base64-encoded strings, JWTs, MongoDB ObjectIDs, and long random tokens that don’t match a named pattern

Each detected correlation gets a confidence score based on how likely it is to be a genuine dynamic value. The scoring also looks at downstream usage — if a value appears in a subsequent request, the confidence goes up. You can set a minimum threshold to filter out noise.

From HAR to runnable script

The generate command takes a HAR file and produces a complete, immediately runnable test script — not just extractors to paste in, but a full test plan with think times, parallel request groups, cookie handling, and status assertions:

# Complete JMeter test plan, correlations included
perf-autocorrelator generate recording.har --tool jmeter -o test.jmx

# Complete k6 script
perf-autocorrelator generate recording.har --tool k6 -o test.js

# Complete Gatling simulation
perf-autocorrelator generate recording.har --tool gatling -o Simulation.scala

The filter pipeline runs before generation, stripping out static resources, OPTIONS preflight requests, WebSocket upgrade calls, and redirect chains that would otherwise clutter the output.

Merging into an existing script

If you’ve already built a script and just need to add the extractors, the correlate command can merge them directly into an existing JMX:

perf-autocorrelator correlate script.jmx -f jmx -m -o script_correlated.jmx

It detects the correlations, generates the appropriate JMeter extractors, and substitutes the hardcoded values with ${variable} references throughout the script.

The web UI (Pro)

The Pro tier includes an embedded web UI — run perf-autocorrelator-pro web and a full browser interface opens on localhost:7070. Upload a HAR, toggle the detection modules you want, and generate extractors or a complete script. Useful when you want a visual overview of what’s been detected before committing to an output file.

Getting Back to That Tuesday Afternoon

So back to that 7am test and the 4pm realisation.

With the old approach, you’re working until 11pm. Methodically tracing each 401, identifying the missing token, adding the extractor, rerunning, finding the next failure. It’s not difficult work, but it’s slow and the margin for error accumulates.

With Perf AutoCorrelator, you export a HAR from Chrome, run generate, and have a correlated test plan in under a minute. You still need to review the output — check that the detected patterns make sense for this application, verify the confidence scores, add any custom parameters the tool hasn’t seen before. But you’re reviewing and adjusting, not building from scratch.

The difference between calling it a day at 5pm versus midnight is the kind of thing that matters.

Perf AutoCorrelator is available now at martkos-it.co.uk/store/perf-autocorrelator/. Basic starts at £29, Pro at £59 — both with a perpetual license and a launch discount for early buyers.

Questions about correlation or performance scripting? Get in touch — I’m always happy to talk through specific scenarios.

The Correlation Problem: A JMeter Story

Why Correlation Is Still Hard

The Manual Approach

What Correlation Actually Costs

The Specific Things That Catch You Out

Automating the Tedious Part

From HAR to runnable script

Merging into an existing script

The web UI (Pro)

Getting Back to That Tuesday Afternoon

Related Articles

Need help with performance testing?

The Correlation Problem: A JMeter Story

Why Correlation Is Still Hard

The Manual Approach

What Correlation Actually Costs

The Specific Things That Catch You Out

Automating the Tedious Part

From HAR to runnable script

Merging into an existing script

The web UI (Pro)

Getting Back to That Tuesday Afternoon

Related Articles

Distributed Load Testing on Kubernetes: JMeter, k6, and Gatling — All Three, Done Right

Migrating Performance Test Scripts Between JMeter, k6, and Gatling

Introducing DummyDataGenPro: Synthetic Test Data Generation Platform

Need help with performance testing?