Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
639 commits
Select commit Hold shift + click to select a range
beac202
Add lora_path argument to bench_multiturn.py (#10092)
Fridge003 Sep 6, 2025
0b8c572
[HiStorage] Remove delete and clear as necessary methods (#10039)
xiezhq-hermann Sep 6, 2025
1a3d6f3
Modify ci workflow for auto-partitioning in 2-GPU backend tests (#10029)
hzh0425 Sep 6, 2025
0e78c63
Revert "[1/N][Bug] Fix w4afp8 MoE NaN issue (sgl-kernel) (#9953)" (#1…
zhyncs Sep 6, 2025
8d114f2
Fix RMSNorm API CALL mismatch issue. (#10032)
sogalin Sep 6, 2025
ad26f29
fix double sparsity initialization (#6905)
shadowpa0327 Sep 6, 2025
dbb1235
[Fix] illegal sync based on undefined behaviour (#9620)
DevashishLal-CB Sep 6, 2025
3fa62da
[7/N] MoE Refactor: the implementation of new framework (#9269)
ch-wan Sep 6, 2025
9a719b7
[NVIDIA] Remove unused `get_fused_moe_impl_class` function (#9764)
kaixih Sep 6, 2025
90dfe3d
[NVIDIA] disable chunked prefix cache when dp and blackwell is used (…
kaixih Sep 6, 2025
012584e
perf: Avoid unnecessary data type conversions for DeepSeek-V3 on Blac…
jinyangyuan-nvidia Sep 6, 2025
21af5c0
[Fix] Compatibility between DP attention and pipeline parallelism (#1…
ch-wan Sep 6, 2025
a5a0320
Fix circular import (#10107)
ch-wan Sep 6, 2025
4c22ebe
Disable kernel cutlass_mla_decode on SM103 (#10058)
hlu1 Sep 6, 2025
039cef7
Remove non-accelerated targets(100 and up) from cmake (#10041)
hlu1 Sep 6, 2025
5f1eb20
[chore] Remove unused ep_moe cuda kernels (#9956)
hlu1 Sep 6, 2025
00974e4
[CI] Refactor disaggregation tests (#10068)
ShangmingCai Sep 6, 2025
b3e7a2c
increase the rust e2e timeout (#10116)
key4ng Sep 6, 2025
9eb50ec
[router] Improve the router e2e tests (#10102)
key4ng Sep 6, 2025
f3b6760
[Auto Sync] Update server_args.py (20250906) (#10117)
merrymercy Sep 6, 2025
cb3918a
Optimize moe_sum_reduce_kernel (#9477)
yuan-luo Sep 7, 2025
9a7ced4
[Feature] LMCache Connector Integration (#9741)
Oasis-Git Sep 7, 2025
dd1e268
CUTLASS fp8 blockwise gemm support of sm120 (#9969)
jianyingzhu Sep 7, 2025
85ed8e0
Optimize nvfp4 block scaled gemm kernel when M is small. (#10101)
HydraQYH Sep 7, 2025
a12061d
Fix cuda graph mode in flashinfer attn backend (#10056)
benbarsdell Sep 7, 2025
41628dc
[HiCache] fix: check clear() method for storage backend (#10096)
stmatengss Sep 7, 2025
111b137
add dataset_path for bench_one_batch_server.py (#10113)
miter6 Sep 7, 2025
617aa2b
[Auto Sync] Update parallel_state.py (20250907) (#10126)
merrymercy Sep 7, 2025
0672468
[Minor] fix lint in main (#10128)
DarkSharpness Sep 7, 2025
e719bb0
[1/2] Refactor multi-tokenizer manager (#10074)
hnyls2002 Sep 7, 2025
76a2c86
Fix flashinfer version in sgl-kernel (#10135)
merrymercy Sep 7, 2025
b0fcbb7
[DOC]: some minor updates (#10134)
yyihuang Sep 7, 2025
33467c0
[BUG FIX] add fail check when get fail in case wait complete block (#…
mss1213 Sep 8, 2025
5a7e10f
[MoE] fix: incorrect weight initialization for cutlass_fused_experts…
ch-wan Sep 8, 2025
f3440ad
vlm: enable GLM4.1V server testing & fix video processing (#10095)
JustinTong0323 Sep 8, 2025
bc5fc33
Fix slow fused add RMSNorm (#10141)
fzyzcjy Sep 8, 2025
7802586
fix the fp8 topk_config.correction_bias is none bug (#10040)
rainj-me Sep 8, 2025
37d83c6
Qwen2.5-VL eagle3 infer (#8801)
Lzhang-hub Sep 8, 2025
400d3b9
Fix run time error in dsv3-fp8 model on mi35x (#10104)
kkHuang-amd Sep 8, 2025
8cda5a6
Standalone speculative decoding (#10090)
Qiaolin-Yu Sep 8, 2025
7577f0e
Add graph runner support with torch compile on CPU (#7843)
CaoE Sep 8, 2025
6049ca2
move compile threads to an option to avoid OOM on low memory host (#1…
rainj-me Sep 8, 2025
ee0b3c5
[1/N][Bug] Fix w4afp8 MoE NaN issue (sgl-kernel, fixed) (#10108)
yuhyao Sep 8, 2025
3b99f23
[Bugfix] Retract not releasing enough memory when page size > 1 (#9989)
xiezhq-hermann Sep 8, 2025
8c5930f
Add speculator attention backend switch (#9981)
cicirori Sep 8, 2025
8116804
Fix: (glm4v) Add missing field (#10147)
JustinTong0323 Sep 8, 2025
b67c277
[Bugfix] Qwen3MoE aclrtMemcpy failed with NPUGraph (#10013)
iforgetmyname Sep 8, 2025
c8295d2
enable auto-round quantization model (#6226)
WeiweiZhang1 Sep 8, 2025
b7d1f17
Revert "enable auto-round quantization model (#6226)" (#10148)
zhyncs Sep 8, 2025
ee21817
enable llama3.1-8B on xpu (#9434)
huaiyuzh Sep 8, 2025
5dd8c64
[Bug fix] Fix Gemma 2 and fix Gemma 3 multimodal with bs > 1 on NPU (…
ssshinigami Sep 8, 2025
bfd7a18
update xgrammar 0.1.24 and transformers 4.56.1 (#10155)
Swipe4057 Sep 8, 2025
78f1398
[1/N] DP-Refactor: move communicators into `tokenizer_communicator_mi…
hnyls2002 Sep 8, 2025
ec99668
[Hicache]: Add E2E CI For 3FS-KVStore (#10131)
hzh0425 Sep 8, 2025
72f9fc5
Monkey patch uvicorn multi worker `is_alive` timeout (#10159)
hnyls2002 Sep 8, 2025
2c2b19b
[CI] fix ambiguous argument in testing hybrid attentions. (#10161)
hnyls2002 Sep 8, 2025
0096798
[1/2] Speed up prefill mla attention (#10156)
fzyzcjy Sep 8, 2025
8085aca
[Bug fix] Fix ascend mla in aclgraph (#9925)
alanhe151220037 Sep 8, 2025
91f0fd9
pref: Add H20 fp8 fused MoE kernel configs for Qwen3 (#10166)
Zhiy-Zhang Sep 8, 2025
9a18aa5
[fix] Relax white space rules in EBNFComposer (#9595)
LukasBluebaum Sep 8, 2025
45b3a6a
Revert "[ModelOpt] Fix Weight Loading for DSR1-FP4 Quantization (#971…
zhyncs Sep 8, 2025
a02071a
[Bench] feat: mooncake trace integration (#9839)
stmatengss Sep 8, 2025
19d64f2
fix: resolve lint issue (#10181)
zhyncs Sep 8, 2025
7a40e4f
fix the cutlass moe tests (#10182)
rainj-me Sep 8, 2025
148022f
gb200: update dockerfile to latest kernel (#9522)
ishandhanani Sep 9, 2025
8ad700f
Cleaning codes for speculative attention mode (#10149)
Fridge003 Sep 9, 2025
df5407f
Revert "feat: add fused moe config for Qwen3-30B-A3B on B200" (#10185)
rainj-me Sep 9, 2025
96784a6
[Fix] Orphan process in data parallel (#7995)
Capronir Sep 9, 2025
ba066ca
Update link for EAGLE speculative decoding (#10191)
gerayking Sep 9, 2025
97fff98
[CPU] Fix phi4-mm prompt issue in bench_serving (#9900)
blzheng Sep 9, 2025
2fe1773
Updated Nvidia Jetson docs (#4422)
shahizat Sep 9, 2025
83d55ac
[1/N]DP refactor: Improve dp rank scheduling in PD disaggregation mod…
hnyls2002 Sep 9, 2025
16ff3d4
Support opt model (#10165)
wenhuipeng Sep 9, 2025
cdc56ef
feat: use sgl-kernel cu129 as default (#10188)
zhyncs Sep 9, 2025
948b01a
[Refactor] Remove Hicache Load & Write threads (#10127)
DarkSharpness Sep 9, 2025
718f25a
Explicitly export CMAKE_BUILD_PARALLEL_LEVEL (#10193)
key4ng Sep 9, 2025
d1d4074
[CPU] Add gelu_and_mul kernel in sgl-kernel and add ut (#9300)
blzheng Sep 9, 2025
94fb4e9
feat: support fa cute in sgl-kernel (#10205)
zhyncs Sep 9, 2025
f5f6b3b
Refactor fused_add_rmsnorm import logic (#10207)
ShangmingCai Sep 9, 2025
2cd94dd
tool-call(dsv3): Fixed a parse problem when there are multiple functi…
Missmiaom Sep 9, 2025
71133a0
[Auto Sync] Update sampling_batch_info.py (20250909) (#10212)
merrymercy Sep 9, 2025
f3817cb
chore: bump v0.3.9 sgl-kernel (#10208)
zhyncs Sep 9, 2025
9ab72f9
add variable TP Decode > Prefill size support (#9960)
shaharmor98 Sep 9, 2025
71fc7b7
[Fix] KV-cache eviction mismatch across PP ranks in DeepSeek V3/R1 (#…
qhsc Sep 9, 2025
d3ee709
chore: upgrade v0.3.9 sgl-kernel (#10220)
zhyncs Sep 9, 2025
d352c29
Revert the changes on NCCL symmetric memory (#10210)
merrymercy Sep 9, 2025
4582931
Revert "Revert the changes on NCCL symmetric memory" (#10238)
merrymercy Sep 9, 2025
8471e5e
[HiCache] feat: add mooncake backend extra config (#10213)
stmatengss Sep 9, 2025
8cbe153
Add mamba kernel (#10234)
yizhang2077 Sep 9, 2025
bf72b80
[Auto Sync] Update io_struct.py (20250909) (#10236)
merrymercy Sep 9, 2025
a06bf66
[Auto Sync] Update collector.py, startup_func_log_and_timer... (20250…
merrymercy Sep 10, 2025
bcf1955
Revert "chore: upgrade v0.3.9 sgl-kernel" (#10245)
merrymercy Sep 10, 2025
15f9934
refactor(InternVL): Use gpu to preprocess the input image (#9795)
KEVINTUAN12 Sep 10, 2025
676a7b5
make --speculative-draft-model an alias of --speculative-draft-model-…
merrymercy Sep 10, 2025
dccf52f
[UT for RL] Add UT to cover release/resume memory case for moe model …
ryang-max Sep 10, 2025
a1d0389
[Benchmark] Prefil-only benchmark scripts (#10240)
sundar24295s Sep 10, 2025
ebd0e1c
[doc] add walkthrough for implementing and hosting a simple llama wra…
glenliu21 Sep 10, 2025
737d73e
Fix: the default choice is wrong for flashinfer mxfp4 moe precision (…
LauYeeYu Sep 10, 2025
5be8c2f
Page first direct IO kernel (#10060)
huangtingwei9988 Sep 10, 2025
4efe2c5
support vlm model spec bench (#10173)
Lzhang-hub Sep 10, 2025
0ac809d
Fix assertion typo in tp_worker.py (#9954)
sgncho Sep 10, 2025
27760fc
[Auto Sync] Update io_struct.py (20250910) (#10262)
merrymercy Sep 10, 2025
e903f69
Fix potential flakiness in test_lora_qwen3 (#10250)
lifuhuang Sep 10, 2025
cda7e47
[router] Add PD router mmlu test (#10256)
key4ng Sep 10, 2025
9410029
[1/2] Refactor LoRA to support backend-specific batch preprocessing. …
lifuhuang Sep 10, 2025
21176b0
[Bugfix] Fix Weightloading for the original nvidia/Deepseek-R1-FP4 ch…
pavanimajety Sep 10, 2025
9e2f725
add dual stream for qwen2_moe (#10252)
yizhang2077 Sep 10, 2025
91b3555
Add tests to AMD CI for MI35x (#9662)
hubertlu-tw Sep 10, 2025
2286e85
pass a_scale from fp8 quant result instead of hard code to 1.0f (#10…
rainj-me Sep 10, 2025
f3b5db6
Feat: support disable tool parser (#10184)
JustinTong0323 Sep 10, 2025
033b75f
[Auto Sync] Update serving_base.py, serving_chat.py, servin... (20250…
merrymercy Sep 10, 2025
6d55f60
Revert "[1/2] Optimizations and refactors about quant kernel (#9534)"…
zhyncs Sep 11, 2025
5b7448d
chore: bump sgl-kernel 0.3.9.post1 (#10294)
zhyncs Sep 11, 2025
5b64f00
[Feature] Support DeepEP normal & Redundant Experts on NPU (#9881)
iforgetmyname Sep 11, 2025
dc491b3
add flash linear attention triton kernel (#10239)
yizhang2077 Sep 11, 2025
4aa1e69
[chore]Add sgl-router to npu images (#10229)
BourneSun0527 Sep 11, 2025
ef959d7
[CPU] fix OOM when mem-fraction is not set (#9090)
ZailiWang Sep 11, 2025
37367da
[fix CI] Fix logical condition in fused MoE layer for compressed tens…
BBuf Sep 11, 2025
de15d14
Revert "Fix flashinfer version in sgl-kernel (#10135)" (#10310)
zhyncs Sep 11, 2025
532f998
chore: bump sgl-kernel 0.3.9.post2 (#10311)
zhyncs Sep 11, 2025
3dd6420
[CI] add pyproject.toml to deepseek w4a8 ci (#10314)
HanHan009527 Sep 11, 2025
bfe01a5
chore: upgrade v0.3.9.post2 sgl-kernel (#10297)
zhyncs Sep 11, 2025
30c6e1f
Qwen3-Next support (#10233)
yizhang2077 Sep 11, 2025
956d805
[Auto Sync] Update parallel_state.py (20250911) (#10326)
merrymercy Sep 11, 2025
64f296f
[Minor] Improve the style of server args (#10328)
merrymercy Sep 11, 2025
4a0e0be
[bugfix] fix norm type error in qwen3_next model (#10322)
cao1zhg Sep 11, 2025
6c18ab4
[Qwen3-Next] switch to triton and cache conv states to accelerate MTP…
hebiao064 Sep 11, 2025
480d1b8
[router] add benchmark for regular router and pd router (#10280)
key4ng Sep 11, 2025
ab795ae
add h20 qwen3 next config (#10264)
yizhang2077 Sep 11, 2025
dee197e
[router] Add OpenAI backend support - core function (#10254)
key4ng Sep 11, 2025
1ee11df
[router][ci] add gpu process check and free port before start server …
key4ng Sep 11, 2025
760b788
add qwen3-next doc (#10327)
yizhang2077 Sep 11, 2025
70c0c1f
fix: trtllm-gen attention take zero-init workspace (#10330)
yyihuang Sep 11, 2025
fe68c14
Fix errors of hicache kernels in sgl-kernel for ROCm (#10339)
hubertlu-tw Sep 11, 2025
46ccbed
update GLM nightly test threshold (#10331)
zminglei Sep 11, 2025
c5d2b01
[LongCat] Optimize zero_experts_compute_triton by changing mask (#10303)
zk-lover Sep 11, 2025
a242406
add try catch for quant config hf download (#10340)
gongwei-130 Sep 11, 2025
b0d25e7
chore: bump v0.5.2 (#10221)
zhyncs Sep 11, 2025
b44c565
Support HiP Attention
daniel-geon-park Mar 4, 2025
7c6f805
Improve for review comments
daniel-geon-park Mar 18, 2025
9953c31
remove rope reuse code
daniel-geon-park Mar 18, 2025
f1e50ab
update exaone
daniel-geon-park Mar 18, 2025
0de8355
add error messages for unsupported models for hip attention
daniel-geon-park Mar 24, 2025
bf47a4f
Implement abstract methods
daniel-geon-park Mar 24, 2025
51e1379
fix eagle draft cuda graph runner
daniel-geon-park Mar 24, 2025
03a2e21
support deepseek-v2
daniel-geon-park Mar 26, 2025
69de45f
giupdate pyproject.toml to include hip-attn
daniel-geon-park Mar 28, 2025
5bdce3a
fix test flag
daniel-geon-park Mar 28, 2025
4123abf
move for loop over batch dim into hip-attention library
daniel-geon-park Mar 28, 2025
1494948
feat: bump hip-attn
kbumsik Mar 30, 2025
aa867f1
bump hip-attn version
daniel-geon-park Apr 1, 2025
0a71bf5
cleanup rebase
daniel-geon-park Apr 8, 2025
768a1b0
fix for MLA
daniel-geon-park Apr 10, 2025
9fd6dd6
fix args
daniel-geon-park Apr 10, 2025
848edf7
update tests
daniel-geon-park Apr 11, 2025
51a00a3
bump hip-attn version
daniel-geon-park Apr 14, 2025
3809fc0
cleanup after rebase
daniel-geon-park Apr 14, 2025
83853fa
remove redundant code
daniel-geon-park Apr 15, 2025
4522988
fix
gmlwns2000 Apr 9, 2025
28d8db3
fix bug
gmlwns2000 Apr 12, 2025
efa2ede
handling chunk attention in hip
gmlwns2000 Apr 14, 2025
6f9406a
PASSKEY, you should remove before PR
gmlwns2000 Apr 15, 2025
8fd77ff
fix
gmlwns2000 Apr 16, 2025
9997ae5
fix
gmlwns2000 Apr 16, 2025
fd898d4
fix
gmlwns2000 Apr 20, 2025
b320ef3
fix
gmlwns2000 Apr 20, 2025
1d50cca
cleaning
gmlwns2000 Apr 29, 2025
457219b
fmt
gmlwns2000 Apr 29, 2025
bfb06b7
support qwen2.5 vL
gmlwns2000 May 2, 2025
2c8362b
cleanup
daniel-geon-park May 2, 2025
ec58af8
run auto format
daniel-geon-park May 2, 2025
17dd1c3
fix
gmlwns2000 May 5, 2025
312a761
bug fix
gmlwns2000 May 9, 2025
26599bc
support video input for qwen-vl
mickqian Apr 11, 2025
640d32d
rebase
mickqian Apr 29, 2025
ff7a50c
simplify processor
mickqian Apr 29, 2025
3d381a2
fix bug
gmlwns2000 May 11, 2025
5e8dfbb
fix
gmlwns2000 May 13, 2025
40a50b6
fix bug
gmlwns2000 May 13, 2025
edb066a
fix
gmlwns2000 May 13, 2025
8ec3b73
fix for qwen2 3 moe
gmlwns2000 May 13, 2025
e081af7
fix
gmlwns2000 May 15, 2025
6ea2d82
fix
gmlwns2000 May 20, 2025
1ae4b81
support chunked
gmlwns2000 May 21, 2025
6a282a5
fix
gmlwns2000 May 21, 2025
a400a48
fix
gmlwns2000 May 22, 2025
b0c7ae2
fix
gmlwns2000 Jun 2, 2025
95d9d17
fix for qwen3
gmlwns2000 Jun 3, 2025
2ed92cb
fix
gmlwns2000 Jun 5, 2025
66d4f58
fix
gmlwns2000 Jun 6, 2025
b192c87
fix
gmlwns2000 Jun 7, 2025
819ddde
fix
gmlwns2000 Jun 9, 2025
9ebf011
fix
kbumsik Jun 12, 2025
d47a72c
fix
kbumsik Jun 12, 2025
4737bcd
fix
kbumsik Jun 18, 2025
1f8f40d
fix
kbumsik Jun 18, 2025
0e43744
fix
kbumsik Jun 18, 2025
9d99433
fix
kbumsik Jun 18, 2025
a741fdc
fix
kbumsik Jun 18, 2025
96e6ec0
dense decode ...?
kbumsik Jun 18, 2025
5821d0a
fix
gmlwns2000 Jun 19, 2025
2462fda
fix deepseekv2
gmlwns2000 Jun 21, 2025
4a94ea3
fix
gmlwns2000 Jun 22, 2025
1335895
temporary fix
gmlwns2000 Jun 23, 2025
aff3b33
fix bug
gmlwns2000 Jun 24, 2025
ac70f0c
fix
gmlwns2000 Jun 24, 2025
1da303e
fmt
gmlwns2000 Jun 25, 2025
674adfe
handle mla spec decode
gmlwns2000 Jun 26, 2025
6aeb1de
fmt
gmlwns2000 Jun 26, 2025
00a14fd
fix bug
gmlwns2000 Jun 26, 2025
c8008a4
fix
gmlwns2000 Jun 26, 2025
d179563
fix radix
gmlwns2000 Jun 30, 2025
a035721
fmt
gmlwns2000 Jul 1, 2025
a3b2662
fix deepseek bug
gmlwns2000 Jul 4, 2025
590fd97
fmt
gmlwns2000 Jul 8, 2025
d67aa00
support fp8
gmlwns2000 Jul 8, 2025
01a27da
fix
gmlwns2000 Jul 10, 2025
247e1a3
fix
gmlwns2000 Jul 11, 2025
8e70c9a
improve warmup
gmlwns2000 Jul 14, 2025
640c582
fix for fp8
gmlwns2000 Jul 15, 2025
e2ca229
run
gmlwns2000 Jul 17, 2025
5384def
add delta
gmlwns2000 Jul 20, 2025
fbf53f0
support fp8e4m3
gmlwns2000 Jul 21, 2025
eb24443
fa3 decode is handled in hip-attention lib
gmlwns2000 Jul 22, 2025
c384d50
update
gmlwns2000 Jul 23, 2025
63557ef
add self extend scale
gmlwns2000 Jul 25, 2025
94ce5f1
fmt
gmlwns2000 Jul 25, 2025
52c9387
this is bad idea...
gmlwns2000 Aug 1, 2025
035082b
fix
gmlwns2000 Aug 1, 2025
e7b1140
fix
gmlwns2000 Aug 1, 2025
0473f32
support GLM45
gmlwns2000 Aug 4, 2025
497c9f3
fix warmup
gmlwns2000 Aug 6, 2025
e5cc3bb
handling gpt-oss sk
gmlwns2000 Aug 6, 2025
1fca935
fmt
gmlwns2000 Aug 6, 2025
7c7499f
support gpt oss
gmlwns2000 Aug 6, 2025
edb5a66
fix gpt-oss
gmlwns2000 Aug 6, 2025
916614c
fix
gmlwns2000 Aug 8, 2025
00a2a9f
add bug message
gmlwns2000 Aug 10, 2025
2e023a5
fmt
gmlwns2000 Aug 10, 2025
02767c6
fmt
gmlwns2000 Aug 10, 2025
deaa3cf
fix
gmlwns2000 Aug 12, 2025
1ce26a6
fmt
gmlwns2000 Aug 14, 2025
5cb816d
fix
gmlwns2000 Aug 15, 2025
83d8acb
fmt
gmlwns2000 Aug 15, 2025
402ea03
fix
gmlwns2000 Aug 17, 2025
0fa639c
add global var
gmlwns2000 Aug 17, 2025
2841661
fix
gmlwns2000 Aug 17, 2025
57aa4ac
suport dual chunk attention
gmlwns2000 Sep 11, 2025
3695b2c
fix
gmlwns2000 Sep 12, 2025
593e888
try to support qwen3 next
gmlwns2000 Sep 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
28 changes: 28 additions & 0 deletions .github/workflows/open-pr-copy-from-oss.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
name: Open A PR to Copy Code From OSS

on:
workflow_dispatch:
# schedule:
# - cron: '0 10 * * *'

permissions:
contents: write

jobs:
copy:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
ref: 'main'

- name: Install GitHub CLI (if not present)
run: |
bash scripts/code_sync/install_github_cli.sh

- name: Copy from OSS code
env:
GH_TOKEN: ${{ secrets.PAT_FOR_CODE_SYNC_FROM_LIANMIN }}
run: |
python3 scripts/code_sync/copy_from_oss.py
31 changes: 31 additions & 0 deletions .github/workflows/open-pr-copy-to-oss.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: Open A PR to Copy Diff To OSS

on:
workflow_dispatch:
inputs:
commit_sha:
description: 'The commit SHA to copy. Defaults to LAST to copy the latest commit.'
required: false
default: 'LAST'

permissions:
contents: write

jobs:
copy:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Install GitHub CLI (if not present)
run: |
bash scripts/code_sync/install_github_cli.sh

- name: Copy to OSS code
env:
GH_TOKEN: ${{ secrets.PAT_FOR_CODE_SYNC_FROM_LIANMIN }}
run: |
python3 scripts/code_sync/copy_to_oss.py --commit ${{ github.event.inputs.commit_sha }}
285 changes: 241 additions & 44 deletions .github/workflows/pr-benchmark-rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ on:
branches: [ main ]
paths:
- "sgl-router/**"
types: [opened, synchronize, reopened, labeled]
workflow_dispatch:

concurrency:
Expand All @@ -19,9 +20,63 @@ permissions:
pull-requests: write
issues: write
jobs:
benchmark-router:
# Quick check job that always runs on PRs
benchmark-compile-check:
name: Benchmark Compilation Check
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Install dependencies
run: |
bash scripts/ci/ci_install_rust.sh

- name: Setup sccache
uses: mozilla-actions/sccache-action@v0.0.3
continue-on-error: true

- name: Rust cache
uses: Swatinem/rust-cache@v2
with:
workspaces: sgl-router
# Share cache across all benchmark jobs
shared-key: "rust-cache"
# Save cache even on failure
save-if: true

- name: Check benchmarks compile
run: |
source "$HOME/.cargo/env"
cd sgl-router/
# Try to use sccache, but disable if it fails
if command -v sccache &> /dev/null; then
echo "Testing sccache availability..."
# Try to start sccache and check if it works
export RUSTC_WRAPPER=sccache
export SCCACHE_GHA_ENABLED="true"
if sccache --start-server 2>/dev/null && sccache --show-stats 2>/dev/null; then
echo "sccache is working, using it for compilation"
else
echo "sccache failed to start, falling back to regular cargo"
unset RUSTC_WRAPPER
unset SCCACHE_GHA_ENABLED
fi
else
echo "sccache not available, using regular cargo"
fi
cargo check --benches

# Full benchmark jobs that only run with label or on main branch
benchmark-request-processing:
name: Request Processing Benchmark
if: |
github.repository == 'sgl-project/sglang' &&
(github.event_name == 'push' ||
github.event_name == 'workflow_dispatch' ||
contains(github.event.pull_request.labels.*.name, 'benchmark'))
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
Expand All @@ -33,77 +88,219 @@ jobs:
run: |
bash scripts/ci/ci_install_rust.sh

- name: Cache Rust dependencies
uses: actions/cache@v4
- name: Setup sccache
uses: mozilla-actions/sccache-action@v0.0.3
continue-on-error: true

- name: Rust cache
uses: Swatinem/rust-cache@v2
with:
path: |
~/.cargo/bin/
~/.cargo/registry/index/
~/.cargo/registry/cache/
~/.cargo/git/db/
sgl-router/target/
key: ${{ runner.os }}-cargo-${{ hashFiles('sgl-router/Cargo.lock') }}
restore-keys: |
${{ runner.os }}-cargo-

- name: Build router in release mode
workspaces: sgl-router
# Share cache across all benchmark jobs
shared-key: "rust-cache"
# Save cache even on failure
save-if: true

- name: Run request processing benchmark
timeout-minutes: 30
run: |
source "$HOME/.cargo/env"
cd sgl-router/
cargo build --release
# Try to use sccache, but disable if it fails
if command -v sccache &> /dev/null; then
echo "Testing sccache availability..."
# Try to start sccache and check if it works
export RUSTC_WRAPPER=sccache
export SCCACHE_GHA_ENABLED="true"
if sccache --start-server 2>/dev/null && sccache --show-stats 2>/dev/null; then
echo "sccache is working, using it for compilation"
else
echo "sccache failed to start, falling back to regular cargo"
unset RUSTC_WRAPPER
unset SCCACHE_GHA_ENABLED
fi
else
echo "sccache not available, using regular cargo"
fi
# Run only the summary benchmark for quick validation in PRs
cargo bench --bench request_processing -- benchmark_summary --exact

- name: Upload benchmark results
if: always()
uses: actions/upload-artifact@v4
with:
name: request-processing-results-${{ github.sha }}
path: |
sgl-router/target/criterion/benchmark_summary/
retention-days: 30

- name: Run quick benchmarks
timeout-minutes: 15
benchmark-tokenizer:
name: Tokenizer Benchmark
if: |
github.repository == 'sgl-project/sglang' &&
(github.event_name == 'push' ||
github.event_name == 'workflow_dispatch' ||
contains(github.event.pull_request.labels.*.name, 'benchmark'))
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 100

- name: Install dependencies
run: |
bash scripts/ci/ci_install_rust.sh

- name: Setup sccache
uses: mozilla-actions/sccache-action@v0.0.3
continue-on-error: true

- name: Rust cache
uses: Swatinem/rust-cache@v2
with:
workspaces: sgl-router
# Share cache across all benchmark jobs
shared-key: "rust-cache"
# Save cache even on failure
save-if: true

- name: Run tokenizer benchmark
timeout-minutes: 30
run: |
source "$HOME/.cargo/env"
cd sgl-router/
# Run quick benchmarks for PR validation using Python script
python3 scripts/run_benchmarks.py --quick --validate-thresholds --save-results
# Try to use sccache, but disable if it fails
if command -v sccache &> /dev/null; then
echo "Testing sccache availability..."
# Try to start sccache and check if it works
export RUSTC_WRAPPER=sccache
export SCCACHE_GHA_ENABLED="true"
if sccache --start-server 2>/dev/null && sccache --show-stats 2>/dev/null; then
echo "sccache is working, using it for compilation"
else
echo "sccache failed to start, falling back to regular cargo"
unset RUSTC_WRAPPER
unset SCCACHE_GHA_ENABLED
fi
else
echo "sccache not available, using regular cargo"
fi
cargo bench --bench tokenizer_benchmark

- name: Upload benchmark results
if: always()
uses: actions/upload-artifact@v4
with:
name: benchmark-results-${{ github.sha }}
name: tokenizer-results-${{ github.sha }}
path: |
sgl-router/target/criterion/
sgl-router/target/criterion/tokenizer*/
retention-days: 30

benchmark-integration-test:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
benchmark-tool-parser:
name: Tool Parser Benchmark
if: |
github.repository == 'sgl-project/sglang' &&
(github.event_name == 'push' ||
github.event_name == 'workflow_dispatch' ||
contains(github.event.pull_request.labels.*.name, 'benchmark'))
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 100

- name: Install dependencies
run: |
bash scripts/ci/ci_install_rust.sh

- name: Cache Rust dependencies
uses: actions/cache@v4
- name: Setup sccache
uses: mozilla-actions/sccache-action@v0.0.3
continue-on-error: true

- name: Rust cache
uses: Swatinem/rust-cache@v2
with:
path: |
~/.cargo/bin/
~/.cargo/registry/index/
~/.cargo/registry/cache/
~/.cargo/git/db/
sgl-router/target/
key: ${{ runner.os }}-cargo-${{ hashFiles('sgl-router/Cargo.lock') }}
restore-keys: |
${{ runner.os }}-cargo-

- name: Run benchmark integration tests
timeout-minutes: 10
workspaces: sgl-router
# Share cache across all benchmark jobs
shared-key: "rust-cache"
# Save cache even on failure
save-if: true

- name: Run tool parser benchmark
timeout-minutes: 30
run: |
source "$HOME/.cargo/env"
cd sgl-router/
# Run integration tests to ensure benchmark code compiles and works
cargo test --test benchmark_integration
# Try to use sccache, but disable if it fails
if command -v sccache &> /dev/null; then
echo "Testing sccache availability..."
# Try to start sccache and check if it works
export RUSTC_WRAPPER=sccache
export SCCACHE_GHA_ENABLED="true"
if sccache --start-server 2>/dev/null && sccache --show-stats 2>/dev/null; then
echo "sccache is working, using it for compilation"
else
echo "sccache failed to start, falling back to regular cargo"
unset RUSTC_WRAPPER
unset SCCACHE_GHA_ENABLED
fi
else
echo "sccache not available, using regular cargo"
fi
cargo bench --bench tool_parser_benchmark

- name: Upload benchmark results
if: always()
uses: actions/upload-artifact@v4
with:
name: tool-parser-results-${{ github.sha }}
path: |
sgl-router/target/criterion/tool_parser*/
retention-days: 30

benchmark-summary:
name: Benchmark Summary
needs: [benchmark-request-processing, benchmark-tokenizer, benchmark-tool-parser]
if: always() && (github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request')
runs-on: ubuntu-latest
steps:
- name: Download all benchmark results
uses: actions/download-artifact@v4
with:
pattern: '*-results-${{ github.sha }}'
path: benchmark-results

- name: Verify benchmark compilation
- name: Generate summary
run: |
source "$HOME/.cargo/env"
cd sgl-router/
# Ensure all benchmarks compile without running them
cargo check --benches
echo "## Benchmark Results Summary" > summary.md
echo "" >> summary.md
echo "### Request Processing" >> summary.md
if [ -d "benchmark-results/request-processing-results-${{ github.sha }}" ]; then
echo "✅ Completed" >> summary.md
else
echo "❌ Failed or skipped" >> summary.md
fi
echo "" >> summary.md
echo "### Tokenizer" >> summary.md
if [ -d "benchmark-results/tokenizer-results-${{ github.sha }}" ]; then
echo "✅ Completed" >> summary.md
else
echo "❌ Failed or skipped" >> summary.md
fi
echo "" >> summary.md
echo "### Tool Parser" >> summary.md
if [ -d "benchmark-results/tool-parser-results-${{ github.sha }}" ]; then
echo "✅ Completed" >> summary.md
else
echo "❌ Failed or skipped" >> summary.md
fi
cat summary.md

- name: Upload summary
uses: actions/upload-artifact@v4
with:
name: benchmark-summary-${{ github.sha }}
path: summary.md
retention-days: 30
Loading