Add bilinear_pos_embed triton kernel and cache #40

zhandaz · 2026-02-08T03:59:44Z

Summary

Add a fused Triton kernel for bilinear position-embedding interpolation that replaces ~25 small eager-mode CUDA kernels with a single launch
Add position embedding cache with pre-warming for the top 100 most common grid configurations (~71% MLPerf VLM dataset coverage, ~0.9 GB at BF16).
Cache is bounded to the pre-defined warmup set to prevent unbounded memory growth; cache misses are computed on-the-fly without caching
Pre-warming runs automatically after model weight loading

Caching mechanism is borrowed from the PR #33 of @b-mu.

How to enable

The Triton kernel is off by default. To enable:

export VLLM_USE_TRITON_POS_EMBED=1

Pos emb cache size can be controlled by

# Default to 100
export VLLM_POS_EMBED_CACHE_SIZE=0

Files changed

vllm/envs.py -- register VLLM_USE_TRITON_POS_EMBED
vllm/model_executor/models/qwen3_vl.py -- Triton kernel, cache infrastructure, warmup grid list

Please review after the next PR on FP8 attn is up. Will run validations of these two PRs together.

Signed-off-by: Zhanda <zhandazhu@gmail.com>

zhandaz added 2 commits February 7, 2026 19:54

feat: Add bilinear_pos_embed triton kernel and cache

82e5ff4

Signed-off-by: Zhanda <zhandazhu@gmail.com>

Set env to control the cache size

1ad7f89

Signed-off-by: Zhanda <zhandazhu@gmail.com>

zhandaz mentioned this pull request Feb 8, 2026

Support Flashinfer FP8 ViT Attn #41

Merged

wangshangsam approved these changes Feb 8, 2026

View reviewed changes

wangshangsam merged commit 0c41c65 into mlperf-inf-mm-q3vl-v6.0 Feb 8, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add bilinear_pos_embed triton kernel and cache #40

Add bilinear_pos_embed triton kernel and cache #40

Uh oh!

zhandaz commented Feb 8, 2026 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add bilinear_pos_embed triton kernel and cache #40

Add bilinear_pos_embed triton kernel and cache #40

Uh oh!

Conversation

zhandaz commented Feb 8, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How to enable

Files changed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhandaz commented Feb 8, 2026 •

edited by github-actions bot

Loading