Skip to content

Conversation

@b-mu
Copy link

@b-mu b-mu commented Jan 30, 2026

Purpose

After integration of high-performance kernels for ViT attention, we saw kernel launch overhead. To improve performance, we add two features:

  • torch.compile(): fuse native kernels, e.g. layernorm, elementwise
  • CUDA graph for the ViT: we sort image sizes in ascending order and greedily pack as many images as possible within largest token budgets, then we check if there's a smaller budget would also fit to avoid waste. We pad cu_seqlens so that allow us to pack various number of images.

We also warmup embedding cache for the ViT in order to reuse embeddings for common seen grid sizes and reduce embedding computations on the fly.

Test Plan

  • Compilation Configs:
    --vllm.cli=--compilation-config='{
      "compile_mm_encoder": true,
      "cudagraph_mm_encoder": true,
      "encoder_cudagraph_verbose": true,
      "encoder_cudagraph_token_budgets": [2048, 4096, 8192, 13824],
      "encoder_cudagraph_max_images_per_batch": 16,
      ...
    }' 

Test Result

  • Tested end-to-end accuracy with the above configuration with FA4 encoder backend

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@b-mu b-mu self-assigned this Jan 30, 2026
@b-mu b-mu changed the title WIP: Reduce Gaps between Kernels in ViT WIP: Enable ViT torch.compile + CUDA Graph Jan 30, 2026
@b-mu b-mu changed the title WIP: Enable ViT torch.compile + CUDA Graph Enable ViT torch.compile + CUDA Graph Feb 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants