cpu - sweep saturation detection #552

maryamtahhan · 2026-01-23T09:57:02Z

Summary

This PR adds configurable saturation detection and optimisation parameters for testing CPU-based deployments/SUTs (vllm-cpu) with the sweep profile. CPU deployments saturate at much lower concurrency rates than GPU deployments (e.g., 16 concurrent requests vs 512), causing sweep tests to continue measuring beyond saturation and producing misleading "knee bend" artifacts in performance graphs. The new parameters allow users to detect saturation early, stop tests efficiently, and exclude anomalous throughput measurements from results.

Details

Added three new configurable parameters to SweepProfile:
- exclude_throughput_target (default: false) - Stops constant-rate tests before reaching throughput level
- exclude_throughput_result (default: false) - Excludes throughput benchmark from saved results
- saturation_threshold (default: 0.98) - Efficiency threshold for detecting saturation (achieved/target rate)
Implemented saturation detection logic in SweepProfile.next_strategy() that stops sweep when efficiency drops below threshold
Added parameter propagation through:
- Settings class for environment variable configuration (GUIDELLM__*)
- CLI parameters in GenerativeBenchmarkEntrypoint
- Profile resolution in SweepProfile.resolve_args()
Modified rate interpolation logic to support both GPU mode (include throughput target) and CPU mode (exclude throughput target)
Added comprehensive documentation in docs/getting-started/benchmark.md

Test Plan

Verified with vLLM-CPU deployment
Test saturation detection works correctly:

guidellm benchmark \
  --target "http://localhost:8000" \
  --profile sweep \
  --exclude-throughput-target true \
  --exclude-throughput-result true \
  --saturation-threshold 0.98 \
  --data "prompt_tokens=256,output_tokens=128" \
  --max-seconds 180

✅ Sweep stopped at test Standardize Logger Implementation with Loguru and Add Comprehensive Unit Tests #5 when efficiency dropped to 97% (below 98% threshold)
✅ TTFT increased correctly: 67.65 → 69.34 → 70.82 → 73.49 → 80.53ms
✅ Throughput benchmark excluded from saved results
✅ No anomalous data points in graphs

** Test that defaults work for GPU deployments**

guidellm benchmark \
  --target "http://localhost:8000" \
  --profile sweep \
  --data "prompt_tokens=256,output_tokens=128"

✅ All parameters default to false/0.98 (GPU-friendly behavior)
✅ Full sweep completes with throughput benchmark included

** Verify environment variable configuration **

export GUIDELLM__EXCLUDE_THROUGHPUT_TARGET=true
export GUIDELLM__EXCLUDE_THROUGHPUT_RESULT=true
export GUIDELLM__SATURATION_THRESHOLD=0.98

guidellm benchmark --target "http://localhost:8000" --profile sweep

✅ Parameters correctly inherited from environment variables

** Run unit tests**

pytest tests/unit/data/test_builders.py tests/unit/test_settings.py -v

✅ All 54 tests passed

Run pre-commit checks

pre-commit run --all-files

✅ All checks passed (linter, formatter, whitespace, EOF)

Related Issues

N/A

"I certify that all code in this PR is my own, except as noted below."

Use of AI

Includes AI-assisted code completion
Includes code generated by an AI application
Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

CPU deployments saturate at much lower concurrency rates than GPU deployments (e.g., 8 vs 512), causing sweep tests to continue measuring beyond saturation. This creates misleading performance graphs with "knee bend" artifacts and wasted benchmark time. This commit adds three configurable parameters to handle saturation: 1. exclude_throughput_target (default: false) - Stops constant-rate tests before reaching throughput level - Prevents generating tests at rates the system cannot sustain - Eliminates "elbow" artifacts in graphs 2. exclude_throughput_result (default: false) - Excludes throughput benchmark from saved results - Removes anomalous burst-capacity data points that create visual artifacts (TTFT spikes from ~70ms to 244ms, inter-token latency anomalies) in performance graphs 3. saturation_threshold (default: 0.98) - Automatically stops sweep when achieved rate < target × threshold - Detects saturation (e.g., system achieves 2.63 req/s when targeting 2.68 req/s = 98% efficiency) - Saves time by skipping unnecessary over-saturated tests Parameters are configurable via CLI flags (--exclude-throughput-target) or environment variables (GUIDELLM__EXCLUDE_THROUGHPUT_TARGET). Defaults remain GPU-friendly (all disabled). CPU deployments should enable both exclusion flags and tune saturation threshold as needed. Signed-off-by: Maryam Tahhan <mtahhan@redhat.com> Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add important note explaining why max_concurrency should not be set when running sweep profile tests. When max_concurrency is artificially limited, the throughput test underestimates server capacity, causing constant-rate tests to run at rates far below actual capacity. This prevents proper saturation detection and can produce misleading results where TTFT decreases instead of increases. Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

Expand the "How It Works" section to provide: - Clear explanation of test execution order - Detailed rationale for each parameter (why + effect) - Explanation of how all three parameters work together - Real-world example of throughput test outliers (23+ sec TTFT) This helps users understand why all three parameters are recommended for CPU deployments and how they complement each other to produce clean, efficient benchmarks. Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

maryamtahhan · 2026-01-23T10:14:47Z

changes graphs from:

maryamtahhan · 2026-01-23T10:16:11Z

To:

maryamtahhan · 2026-01-23T10:17:31Z

and sweep stops gracefully when saturation is detected:

maryamtahhan and others added 3 commits January 22, 2026 11:30

maryamtahhan changed the title ~~Cpu sweep saturation~~ cpu - sweep saturation detection Jan 23, 2026

maryamtahhan added 2 commits January 23, 2026 10:01

docs: Apply mdformat formatting to benchmark.md

7d80df5

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

fix: Add type guards for mypy in saturation detection

e8ccf77

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

maryamtahhan force-pushed the cpu-sweep-saturation branch from 9457782 to e8ccf77 Compare January 23, 2026 10:08

maryamtahhan added 2 commits January 23, 2026 10:08

Merge branch 'main' into cpu-sweep-saturation

a8ee779

docs: Update terminology to 'CPU based system under test'

4effdb1

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpu - sweep saturation detection #552

cpu - sweep saturation detection #552

Uh oh!

maryamtahhan commented Jan 23, 2026 •

edited

Loading

Uh oh!

maryamtahhan commented Jan 23, 2026 •

edited

Loading

Uh oh!

maryamtahhan commented Jan 23, 2026

Uh oh!

maryamtahhan commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cpu - sweep saturation detection #552

Are you sure you want to change the base?

cpu - sweep saturation detection #552

Uh oh!

Conversation

maryamtahhan commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Test Plan

Related Issues

Use of AI

Uh oh!

maryamtahhan commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maryamtahhan commented Jan 23, 2026

Uh oh!

maryamtahhan commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

maryamtahhan commented Jan 23, 2026 •

edited

Loading

maryamtahhan commented Jan 23, 2026 •

edited

Loading