A comprehensive pipeline for generating Qiskit datasets and training language models using ORPO (Odds Ratio Preference Optimization).
This repository accompanies the paper βQSpark: Towards Reliable Qiskit Code Generation via ORPO and GRPO Optimization.β The study introduces a unified framework that scrapes Qiskit-related repositories from GitHub, constructs HumanEval-style and preference datasets, and fine-tunes large language models using both ORPO and GRPO optimization techniques. The work demonstrates significant improvements in quantum code reliability, simulation success rates, and state fidelity across benchmark tasks, highlighting the importance of quantum-aware reward mechanisms for generative model training.
This repository contains two main components:
-
qiskit_to_orpo_pipeline.py- End-to-end script that scrapes Qiskit-related Python files from GitHub, extracts functions, and creates both HumanEval-format JSON datasets and ORPO training CSV files. -
orpo_training.py- ORPO fine-tuning script using Unsloth for efficient training of language models on Qiskit code. -
grpo_qiskit_training.py- Advanced GRPO (Group Relative Policy Optimization) training with quantum-specific reward functions for superior quantum code generation. -
qiskit_benchmark_evaluation.py- Comprehensive evaluation script that compares GRPO vs ORPO model performance on quantum code generation tasks. -
qiskit_benchmark_plots.py- Visualization script that generates publication-ready plots from benchmark results.
- π Search GitHub for Qiskit-related Python files
- β‘ Concurrent downloading with configurable thread count
- π‘οΈ Rate limiting to respect GitHub API limits
- π§Ή Automatic deduplication and difficulty analysis
- π Dual output: HumanEval JSON + ORPO CSV formats
- π Difficulty categorization (basic/intermediate/advanced)
- π Efficient training with Unsloth framework
- πΎ LoRA fine-tuning for memory efficiency
- π Wandb integration for experiment tracking
- π― Optimized for Qiskit code generation tasks
- π§ Advanced Group Relative Policy Optimization
- β‘ Multiple quantum-aware reward functions
- π¬ AST + style conformance checking
- π― Qiskit import/compile success validation
- π Simulation fidelity vs. reference comparison
- ποΈ Resource efficiency optimization
- π XML-structured output formatting
- π Comprehensive model comparison (GRPO vs ORPO)
- π¬ Quantum circuit compilation success rates
- β‘ Simulation success and fidelity metrics
- π Circuit depth and resource efficiency analysis
- π Automated CSV report generation
- π― Side-by-side performance comparison
- π Publication-ready benchmark plots
- π Success rate comparisons (compile/simulation)
- π¬ Fidelity distribution histograms
- π Circuit depth box plots
- π¨ Professional styling with seaborn/matplotlib
- πΎ High-resolution PNG outputs
- Python 3.8+
- GitHub Personal Access Token
- Internet connection
- CUDA-compatible GPU (for training)
requests>=2.25.1tqdm>=4.60.0
torch>=2.0.0unsloth>=2024.1transformers>=4.35.0datasets>=2.14.0trl>=0.7.0accelerate>=0.24.0qiskit>=0.45.0(for GRPO quantum simulation)qiskit-aer>=0.13.0(for GRPO quantum simulation)pandas>=1.5.0(for evaluation metrics)numpy>=1.21.0(for numerical analysis)matplotlib>=3.5.0(for plotting)seaborn>=0.11.0(for statistical visualizations)
-
Clone this repository
-
Install basic dependencies:
pip install -r requirements.txt
-
For ORPO/GRPO training, install additional dependencies:
pip install unsloth trl transformers peft bitsandbytes accelerate datasets pip install qiskit qiskit-aer pandas numpy matplotlib seaborn # for GRPO quantum simulation, evaluation, and visualization
Generate Qiskit datasets from GitHub:
python qiskit_to_orpo_pipeline.py --token YOUR_GITHUB_TOKEN --max-files 100Arguments:
--token(required): GitHub personal access token--max-files(optional): Maximum number of files to download (default: 100)
Output Files:
qiskit_humaneval_dataset.json- HumanEval format for evaluationFormatted_ORPO_Dataset.csv- ORPO training data
Fine-tune a language model using the generated dataset:
python orpo_training.pyConfiguration (in script):
MODEL_NAME: Base model (default: "Qwen/Qwen2.5-Coder-32B-Instruct")DATA_FILE: Path to CSV dataset (default: "./Formatted_ORPO_Dataset.csv")OUTPUT_DIR: Output directory (default: "./orpo_outputs")BATCH_SIZE: Training batch size (default: 2)EPOCHS: Number of training epochs (default: 2)
Advanced training with quantum-specific reward functions:
python grpo_qiskit_training.pyConfiguration (in script):
MODEL_NAME: Base model (set this before running)DATA_PATH: Path to JSON dataset (default: "./qiskit_humaneval_dataset.json")OUTPUT_DIR: Output directory (default: "./grpo_qiskit_outputs")LORA_RANK: LoRA rank (default: 16)BATCH_SIZE: Training batch size (default: 1)EPOCHS: Number of training epochs (default: 1)
GRPO Reward Functions:
- Format: XML structure + AST validation
- Qiskit Import: Valid import patterns (no deprecated APIs)
- Compile & Simulate: Circuit compilation and simulation success
- Fidelity: State fidelity comparison with reference
- Resource Efficiency: Circuit depth and gate count optimization
Compare GRPO vs ORPO model performance:
python qiskit_benchmark_evaluation.pyInput Files:
grpo_qiskit_generated_completions.json- GRPO model outputsorpo_qiskit_generated_completions.json- ORPO model outputs
Output:
qiskit_benchmark_results.csv- Detailed performance metrics- Console summary with key statistics
Evaluation Metrics:
- Compile Success Rate: Percentage of syntactically valid code
- Simulation Success Rate: Percentage that successfully simulate
- Fidelity: Quantum state fidelity vs. reference solutions
- Circuit Depth: Resource efficiency comparison
Generate publication-ready plots from benchmark results:
python qiskit_benchmark_plots.pyInput Files:
qiskit_benchmark_results.csv- Summary statistics from evaluationqiskit_benchmark_full_results.csv- Detailed per-task results (optional)
Output:
benchmark_plots/success_rates.png- Compile/simulation success ratesbenchmark_plots/fidelity_distribution.png- Fidelity distribution histogrambenchmark_plots/depth_boxplot.png- Circuit depth comparison
Plot Types:
- Success Rate Bar Charts: Compile and simulation success rates
- Fidelity Histograms: Distribution of quantum state fidelity
- Depth Box Plots: Circuit depth resource efficiency comparison
- Go to GitHub Settings > Developer settings > Personal access tokens
- Generate a new token with appropriate permissions
- Use the token with the
--tokenargument
- Generate Dataset: Run
qiskit_to_orpo_pipeline.pyto scrape GitHub and create training data - Train Model: Choose between:
- ORPO: Run
orpo_training.pyfor standard preference optimization - GRPO: Run
grpo_qiskit_training.pyfor advanced quantum-specific training
- ORPO: Run
- Generate Completions: Use trained models to generate quantum code completions
- Evaluate: Run
qiskit_benchmark_evaluation.pyto compare model performance - Visualize: Run
qiskit_benchmark_plots.pyto generate publication-ready plots
Dataset Generation:
qiskit_repos/- Directory containing downloaded Python filesqiskit_humaneval_dataset.json- HumanEval format for evaluationFormatted_ORPO_Dataset.csv- ORPO training data
Training:
orpo_outputs/lora_model/- ORPO fine-tuned LoRA modelgrpo_qiskit_outputs/lora_model/- GRPO fine-tuned LoRA modelgrpo_qiskit_merged/- Optional merged GRPO model (if MERGE=True)- Training logs and metrics (if using Wandb)
Evaluation:
qiskit_benchmark_results.csv- Performance comparison metrics- Console output with detailed statistics
Visualization:
benchmark_plots/success_rates.png- Success rate bar chartsbenchmark_plots/fidelity_distribution.png- Fidelity distribution plotsbenchmark_plots/depth_boxplot.png- Circuit depth comparisons
Feel free to contribute to this project by submitting issues or pull requests.
This project is open source and available under the MIT License.
If you use this repository or build upon this work, please cite:
@article{kheiri2025qspark,
title={QSpark: Towards Reliable Qiskit Code Generation},
author={Kheiri, Kiana and Aamir, Aamna and Miranskyy, Andriy and Ding, Chen},
journal={arXiv preprint arXiv:2507.12642},
year={2025}
}