Skip to content

whether support Fast-dLLM v2 #5

@FWXT

Description

@FWXT

Fast-dLLM v2 has below Generation Process to speed up:

  1. Block-level Generation: Autoregressive at the block level
  2. Sub-block Parallelization: Parallel decoding within blocks for efficiency
  3. Hierarchical Caching: Block and sub-block level caching for speed optimization
    whether already support it? thx!

Sub-issues

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions