Best Practices

Load testing is easy to run and hard to run well. The features in the previous sections give you a lot of knobs; this section is a condensed opinion on how to use them effectively — grounded in the module's actual behavior, not generic load-testing folklore.

Pick the Right Test Type

Each preset was tuned for one question. Match the question to the type instead of reaching for whichever one you used last time.

QuestionType
"Does the new build still meet our P95 SLO under normal traffic?"Load
"How many VUs before error rate or latency breaks?"Stress
"Does autoscaling keep up with a sudden traffic surge?"Spike
"Does the service degrade after hours of steady traffic?"Soak
"Can I reproduce the production traffic shape from last Friday?"Custom

Design Thresholds Before You Design the Load

A run without thresholds is a run you have to eyeball every time. Add thresholds first, then tune the load profile until they start failing. A few patterns that work well:

  • Tie to real SLOs — if production runs on a 500 ms P95 SLO, your threshold should be the same number. The load test becomes a pre-deploy gate, not a random health check
  • Error rate first — the single most load-sensitive metric. error_rate < 1 % catches broken builds before percentile math even gets interesting
  • P95 and P99 together — P95 catches mainline regressions, P99 catches tail issues the average hides. Use both when quality of the long tail matters
  • Throughput floors — for benchmarks, add throughput > X req/s to catch regressions where the service is technically passing latency thresholds but processing fewer requests

Use Ramp-Up, Don't Cold-Blast

Starting 500 VUs instantly on an idle service is not a realistic test — it's a denial-of-service of your own backend. Every preset except Spike uses a proper ramp-up for a reason: the service gets to warm JIT caches, grow connection pools, and scale workers before the target load hits. Keep the default ramps, or make them longer, not shorter.

The one place to skip ramp-up is Spike tests — that's the whole point of the preset. Everywhere else, a short ramp-down helps avoid dangling connection errors at the tail of the run that aren't really the service's fault.

Watch the warning: the Duration field surfaces a red warning if rampUp + rampDown > duration. If you see it, either lengthen the duration or shorten the ramps — otherwise your "60-second load test" has zero seconds of sustained load to measure.

Use Lifecycle Scripts for What They're Made For

  • Before All for setup — log in, mint a token, seed a test database row, warm a cache. Everything a VU needs before the real load starts
  • After All for reporting — Slack webhook, POST to an ingestion endpoint, upload the HTML/PDF report to S3, trigger a downstream Jenkins job
  • Step pre/post scripts for per-iteration logic only — fetch a token that must rotate each request, compute a signature, extract a dynamic ID. Everything else belongs in Before All

Hot-path cost: per-step scripts run on every iteration of every VU. At 500 VUs with a 1-second iteration time, a 5 ms script costs 2.5 seconds of CPU per second. Keep them tight.

Compare, Don't Eyeball

A single run tells you if the system passes right now. Two runs compared tell you whether a change helped, hurt, or did nothing. Build the habit of running at least two runs for every meaningful change and feeding both into the Compare view:

  • Before/after deploys — run the same spec on the old and new version, compare the delta table
  • Capacity sweeps — run the same spec with 100 / 200 / 400 VUs, add all three to a compare, and see where the per-step table starts glowing red
  • Config tuning — flip a backend feature flag, re-run, compare. The delta columns turn subjective optimization arguments into objective numbers

When to Enable OTLP Streaming

Streaming is not always on because not every run needs it. A good default: enable it for long runs, team runs, and gated runs; leave it off for quick local sanity checks.

  • Long runs — Soak tests that last an hour. Stream to Grafana so you can look at the dashboard instead of babysitting the Results view
  • Team runs — anything stakeholders want to watch. A shared Grafana URL beats a screen-share every time
  • Gated runs — CI-initiated load tests where the results also need to feed the observability stack alongside whatever report is produced

Secrets Go Through Vault or OS Keychain

There are exactly two correct places for a real credential in a Load Test environment:

  • Vault for anything the team shares — production tokens, staging keys, short-lived secrets that rotate
  • OS Secret for anything only your laptop should see — personal API keys, local dev tokens, anything that should not sync to a teammate's machine

Manual variables are versionable and committable, which is exactly what makes them the wrong place for secrets. Base URLs, feature flags, non-sensitive tunables — Manual. Anything you wouldn't paste into a pull request — Vault or OS Secret.

Common Pitfalls

  • Testing against a cold service — the first 5–10 seconds of any run are warm-up, not real load. Either use ramp-up (which masks it automatically) or ignore the opening seconds when reading the chart
  • Ignoring think time — zero think time means every VU becomes a tight CPU-bound loop. Real users don't behave that way. Keep a realistic min/max unless you're deliberately building a throughput benchmark
  • Benchmarking against your laptop — if the target is localhost, you're measuring the laptop's own throughput, which is not what production looks like. Great for regression detection, misleading for capacity planning
  • Fail-on-assertion without reviewing assertions — if a step has a flaky assertion that hits once every 1000 iterations, turning the toggle on means your whole run fails for a non-load reason. Either fix the assertion or leave the toggle off
  • Sharing secrets across environments by accident — environments are per spec group. Moving a spec to a new group and forgetting to re-add its environment means the run will start with only Manual variables, silently