-
Notifications
You must be signed in to change notification settings - Fork 10
Open
1 / 11 of 1 issue completedDescription
Fast-dLLM v2 has below Generation Process to speed up:
- Block-level Generation: Autoregressive at the block level
- Sub-block Parallelization: Parallel decoding within blocks for efficiency
- Hierarchical Caching: Block and sub-block level caching for speed optimization
whether already support it? thx!
Reactions are currently unavailable
Sub-issues
Metadata
Metadata
Assignees
Labels
No labels