Llama.cpp Diffusion Language Model Support

支持在 llama.cpp 中运行扩散语言模型。

编译步骤

1. 安装依赖

# 安装 llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build && cd build
cmake .. -DLLAMA_CUBLAS=ON  # 或其他后端
make -j
sudo make install

# 安装 pybind11
pip install pybind11
2. 编译 Diffusion 扩展
bash
复制代码
mkdir build && cd build
cmake .. -DLLAMA_CUBLAS=ON
make -j
3. 安装 Python 包
bash
复制代码
pip install -e .
使用方法
1. 转换模型
bash
复制代码
python convert_to_gguf.py \
    --model_path /path/to/hf/model \
    --output_path ./models/diffusion-model.gguf
2. 量化模型 (可选)
bash
复制代码
# 使用 llama.cpp 的量化工具
./llama-quantize ./models/diffusion-model.gguf \
                 ./models/diffusion-model-q4_0.gguf q4_0
3. 运行推理
python
复制代码
import llama_diffusion
from transformers import AutoTokenizer

# 加载模型
model = llama_diffusion.LlamaDiffusion(
    model_path="./models/diffusion-model-q4_0.gguf",
    n_ctx=8192,
    n_gpu_layers=35
)

# 加载 tokenizer
tokenizer = AutoTokenizer.from_pretrained("path/to/tokenizer")

# 生成
prompt_tokens = tokenizer.encode("Your prompt here")
output_tokens = model.generate(
    prompt=prompt_tokens,
    mask_token_id=tokenizer.mask_token_id,
    gen_length=1024,
    block_length=4,
    denoising_steps=4
)

# 解码
output_text = tokenizer.decode(output_tokens)
print(output_text)

## 使用流程总结

1. **编译 llama.cpp 和扩展**
2. **转换模型**: HF格式 → GGUF格式
3. **量化模型** (可选): 减少内存占用
4. **运行推理**: 使用 Python 接口

## GPU 采样加速

当 `cmake` 配置中开启 `GGML_CUDA=ON` 时，可以在推理阶段将采样步骤下放到 CUDA：

- C++ `DiffusionConfig` 新增 `enable_gpu_sampler` 字段。
- Python 绑定 `generate` / `generate_stream` / `generate_with_profiling` 增加 `use_gpu_sampler` 形参。
- 示例：

```python
output_tokens = model.generate(
    prompt=prompt_tokens,
    mask_token_id=mask_token_id,
    use_gpu_sampler=True,
)

test_profiling.py 中的 GPU Sampler 配置可直接用于对比 CPU 与 GPU 采样阶段的耗时差异，并会同步记录在 profile_results.json。

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
docs		docs
extern		extern
llama_diffusion		llama_diffusion
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
CppProperties.json		CppProperties.json
LICENSE		LICENSE
README.md		README.md
chat.py		chat.py
example_usage.py		example_usage.py
example_usage_stream.py		example_usage_stream.py
profile_summary.txt		profile_summary.txt
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
test_concurrent_throughput.py		test_concurrent_throughput.py
test_profiling.py		test_profiling.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Llama.cpp Diffusion Language Model Support

编译步骤

1. 安装依赖

About

Uh oh!

Releases

Packages

Languages

License

luozixin2/Llama-diffusion

Folders and files

Latest commit

History

Repository files navigation

Llama.cpp Diffusion Language Model Support

编译步骤

1. 安装依赖

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages