Litex LLM Development

2025.9.22 gsm8k 93.5%，minif2f 4.5%. qwen2.5-instruct-7B converge after one epoch on gsm8k. Data(8k gsm, 100 minif2f) is not enough. High quality data is even more scarce.

TODO:

More minif2f
gsm 93% is not good enough. Temperature, topk, prompt, model parameters all need to be tuned.
Synthetic theorem proof data should be used instead of gsm8k.
reproduction on deepseek-math-7B for comparison with deepseek-prover-1.5-base
More theorem proof task with medium difficulty for RL

Environment Setup

bash setup_env.sh

Download lora weights from google drive

Training

bash run.sh

Evaluation

evaluate on miniF2F

python eval.py

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
evaluation_logs		evaluation_logs
pylitex		pylitex
README.md		README.md
config.yaml		config.yaml
eval.py		eval.py
litex_0.1.11-beta_amd64.deb		litex_0.1.11-beta_amd64.deb
litex_sft.py		litex_sft.py
run.sh		run.sh
setup_env.sh		setup_env.sh
test_gsm8k_litex.json		test_gsm8k_litex.json
test_litex.json		test_litex.json
train_gsm8k_litex.json		train_gsm8k_litex.json
train_litex.json		train_litex.json
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Litex LLM Development

Environment Setup

Training

Evaluation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

litexlang/litex-llm-dev

Folders and files

Latest commit

History

Repository files navigation

Litex LLM Development

Environment Setup

Training

Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages