LLM Benchmarks

Benchmark results for LLMs running on NVIDIA DGX Spark GB10.

Quick Reference

Date	Model	Engine	Config	Gen t/s @ 16K	Notes
2026-01-24	GLM-4.7-Flash AWQ	vLLM	TRITON_MLA	26.0	Baseline
2026-01-24	GLM-4.7-Flash AWQ	vLLM	FLASHINFER+FP8	18.8	128K ctx

Models Tested

GLM-4.7-Flash AWQ

Engine: vLLM (scitrera/dgx-spark-vllm:0.14.0-t5)
Quantization: AWQ 4-bit
Backends Tested: TRITON_MLA, FLASHINFER+FP8
Results: View all
Learnings: FP8 vs MLA Trade-offs

Performance Summary

Backend	Baseline t/s	8K t/s	16K t/s	Max Context
TRITON_MLA	41.6	33.1	26.0	32K
FLASHINFER+FP8	40.2	27.3	18.8	128K

Verdict: TRITON_MLA is faster. Use FP8 only when >32K context is required.

Benchmark Tool

Using llama-benchy for OpenAI-compatible endpoint testing.

Quick Start

uvx llama-benchy \
  --base-url http://localhost:8000/v1 \
  --model <model-name> \
  --tokenizer <hf-tokenizer> \
  --pp 2048 --tg 32 \
  --depth 0 4096 8192 16384 24576 \
  --runs 3 \
  --enable-prefix-caching \
  --latency-mode generation

Repository Structure

llm-benchmarks/
├── README.md          # This file - index + history
├── CLAUDE.md          # AI agent instructions
├── docs/
│   ├── guides/        # Usage documentation
│   └── learnings/     # Benchmark insights and comparisons
├── templates/
│   └── result.md      # Template for new benchmarks
├── logs/
│   └── <model>/       # Raw benchmark output
└── results/
    └── <model>/       # Human-readable summaries

Hardware

System: NVIDIA DGX Spark
GPU: GB10 (Blackwell SM120, 128GB unified memory)
Inference Engines: vLLM, llama.cpp

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs		docs
logs/glm-4.7-flash		logs/glm-4.7-flash
results/glm-4.7-flash		results/glm-4.7-flash
templates		templates
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Benchmarks

Quick Reference

Models Tested

GLM-4.7-Flash AWQ

Performance Summary

Benchmark Tool

Quick Start

Repository Structure

Hardware

License

About

Uh oh!

Releases

Packages

seanGSISG/llm-benchmarks

Folders and files

Latest commit

History

Repository files navigation

LLM Benchmarks

Quick Reference

Models Tested

GLM-4.7-Flash AWQ

Performance Summary

Benchmark Tool

Quick Start

Repository Structure

Hardware

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages