Skip to content

Conversation

@ChanceSiyuan
Copy link
Collaborator

@ChanceSiyuan ChanceSiyuan commented Jan 19, 2026

Summary

This PR implements comprehensive noisy circuit dataset generation with DEM extraction, UAI format conversion, and syndrome sampling capabilities. All documentation has been consolidated into a unified getting started guide, and the dataset is organized into well-structured subdirectories.

Issues Addressed

Issue #12: Noisy Circuit Dataset ✅

  • Created datasets/circuits/ with three surface code circuits
  • All circuits use d=3 surface code with p=0.01 depolarizing noise
  • Circuits include detectors and logical observables for BP decoding

Issue #4: Detector Error Model Generation ✅

  • Implemented extract_dem() to generate DEMs from circuits
  • Added save_dem() to write .dem files in stim format
  • Created generate_dem_from_circuit() for CLI integration
  • Added --generate-dem flag to CLI
  • DEMs saved in datasets/dems/ directory
  • UAI Format Support:
    • Implemented dem_to_uai() to convert DEM to UAI format
    • Added save_uai() and generate_uai_from_circuit()
    • Added --generate-uai flag to CLI
    • UAI files saved in datasets/uais/ directory
    • UAI files enable integration with TensorInference.jl
  • Comprehensive test coverage in tests/test_dem.py (19 tests)

Issue #5: Syndrome Database Generation ✅

  • Implemented sample_syndromes() to generate detection events
  • Added save_syndrome_database() to write .npz files
  • Created generate_syndrome_database_from_circuit() for CLI integration
  • Added --generate-syndromes flag to CLI
  • Syndromes saved in datasets/syndromes/ directory
  • Comprehensive test coverage in tests/test_syndrome.py

Dataset Organization

Well-structured dataset with separate subdirectories by file type:

datasets/
├── circuits/     # Circuit files (.stim)
├── dems/         # Detector error models (.dem)
├── uais/         # UAI format files (.uai)
└── syndromes/    # Syndrome databases (.npz)

Documentation

New unified guide:

  • examples/GETTING_STARTED.md - Comprehensive guide covering:
    • Quick start instructions
    • Step-by-step pipeline explanation
    • UAI format introduction for beginners
    • Detailed format documentation (.stim, .dem, .uai, .npz)
    • Code examples for all use cases
    • Troubleshooting and best practices

Other documentation:

  • datasets/README.md - Dataset overview with organization structure
  • examples/minimal_example.py - Full pipeline demonstration

Removed redundant files:

  • Merged datasets/SYNDROME_DATASET.md into unified guide
  • Merged examples/PIPELINE_ILLUSTRATION.md into unified guide

Testing

✅ All 62 tests passing
✅ Circuit generation validated
✅ DEM extraction tested (.dem and .uai formats)
✅ Syndrome sampling tested
✅ CLI integration verified
✅ CI passing on Python 3.10, 3.11, 3.12

Usage Examples

# Generate circuits with DEMs and syndromes
python -m bpdecoderplus.cli -d 3 -r 3 5 7 --generate-dem --generate-syndromes 1000

# Generate circuits with UAI format for probabilistic inference
python -m bpdecoderplus.cli -d 3 -r 3 5 7 --generate-uai

# Generate all formats
python -m bpdecoderplus.cli -d 3 -r 3 5 7 --generate-dem --generate-uai --generate-syndromes 1000

UAI Format

The UAI format represents the DEM as a Markov network:

  • Each detector is a binary variable (0 or 1)
  • Each error mechanism is a factor/clique
  • Factor tables encode error probabilities
  • Compatible with TensorInference.jl and other probabilistic inference tools

This enables:

  • Exact inference (partition function calculation)
  • Marginal probability computation
  • MAP (maximum a posteriori) inference
  • Integration with tensor network methods

Closes

Closes #12
Closes #4
Closes #5

🤖 Generated with Claude Code

ChanceSiyuan and others added 18 commits January 17, 2026 13:51
- Add Stim circuits for rotated surface code (d=3) memory experiments
  with circuit-level depolarizing noise (p=0.01) at 3, 5, 7 rounds
- Add generation script (scripts/generate_noisy_circuits.py)
- Add comprehensive README with BP decoding tutorial and examples
- Add visualization images (qubit layout, parity check matrix, syndrome stats)
- Update .gitignore to exclude .venv/
- Convert scripts/ to src/bpdecoderplus/ package following Python best practices
- Add pyproject.toml with uv/hatchling build system and dependencies
- Add comprehensive test suite (32 tests) for circuit.py and cli.py
- Update .gitignore with Python-specific patterns
- Update README to use new CLI entry point via uv

Addresses PR feedback from @GiggleLiu.
- Add Makefile with targets for install, setup, generate-dataset, test, and clean
- Update pyproject.toml with uv dev-dependencies configuration
- Addresses issue #12 requirements for automation and uv package management

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add test.yml workflow to run tests on push and PR
- Test on Python 3.10, 3.11, and 3.12
- Use uv for dependency management in CI
- Addresses PR #14 review comment for CI/CD setup

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update CI workflow to generate coverage reports with pytest-cov
- Upload coverage to Codecov for tracking
- Add test status and coverage badges to README
- Add `make test-cov` target for local coverage reports
- Update .gitignore to exclude coverage files

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Set ignore-nothing-to-cache to true to allow CI to proceed
when uv.lock is not present in the repository.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove enable-cache to avoid lock file requirement.
Caching can be re-enabled later with a proper lock file.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Delete all PNG files (layout, parity check matrix, syndrome stats)
- Update README to remove image references
- Keep focus on circuit files and code examples

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add syndrome.py module for sampling and saving syndromes
- Integrate syndrome generation into CLI with --generate-syndromes flag
- Add comprehensive test suite for syndrome operations
- Add make generate-syndromes target for easy database creation
- Support npz format with metadata for efficient storage

Features:
- Sample detection events from circuits
- Save/load syndrome databases with metadata
- Generate databases directly from circuit files
- CLI integration for automated workflow

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add dem.py module for DEM extraction and manipulation
- Extract DEM from circuits with decomposition support
- Save/load DEMs in stim native format
- Convert DEM to JSON for analysis
- Build parity check matrix H for BP decoding
- Integrate DEM generation into CLI with --generate-dem flag
- Add comprehensive test suite for DEM operations
- Add make generate-dem target

Features:
- Extract detector error models from circuits
- Save in .dem format (stim native)
- Export to JSON with structured error information
- Build parity check matrix for BP decoder
- CLI integration for automated workflow

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Stim returns boolean arrays by default, not uint8.
Update test to accept both bool and uint8 dtypes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add SYNDROME_DATASET.md with complete API documentation
- Add validate_dataset.py for dataset generation and validation
- Document data format, API interface, and validation checks
- Include usage examples and statistics
- Provide evidence of dataset validity

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add minimal_example.py with complete end-to-end demonstration
- Add PIPELINE_ILLUSTRATION.md with visual pipeline diagrams
- Include detailed explanations of each step
- Show data flow and file formats
- Provide conceptual understanding of the pipeline

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Rewrote PIPELINE_ILLUSTRATION.md as a practical getting-started guide focused on data generation workflow. Added generate_demo_dataset.py to provide a working example that generates, validates, and saves a small syndrome dataset. These changes make it easier for new users to understand and use the package.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit reorganizes the dataset structure and ensures proper file
placement for circuits, DEMs, and syndromes.

Changes:
- Reorganize datasets/ into circuits/, dems/, and syndromes/ subdirectories
- Update CLI default output to datasets/circuits/
- Update DEM generation to save files in datasets/dems/
- Update syndrome generation to save files in datasets/syndromes/
- Fix test to reflect new default output path
- Add demo DEM and syndrome files for all three circuit variants

Resolves #4: Detector error model generation now saves .dem files
Resolves #5: Syndrome database generation now saves .npz files

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit adds support for generating UAI (Uncertainty in Artificial
Intelligence) format files from detector error models, enabling
probabilistic inference with tools like TensorInference.jl.

Changes:
- Add dem_to_uai() to convert DEM to UAI format
- Add save_uai() to save UAI files
- Add generate_uai_from_circuit() for CLI integration
- Add --generate-uai flag to CLI
- Generate UAI files for all demo circuits
- Add comprehensive test coverage for UAI functionality

The UAI format represents the DEM as a Markov network where:
- Each detector is a binary variable
- Each error mechanism is a factor/clique
- Factor tables encode error probabilities

Addresses #4

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit merges SYNDROME_DATASET.md and PIPELINE_ILLUSTRATION.md into
a single comprehensive GETTING_STARTED.md guide in the examples folder.

Changes:
- Create examples/GETTING_STARTED.md with unified content
- Add UAI format introduction for beginners
- Update all file paths to reflect new dataset organization
- Remove redundant datasets/SYNDROME_DATASET.md
- Remove redundant examples/PIPELINE_ILLUSTRATION.md

The new guide provides:
- Quick start instructions
- Step-by-step pipeline explanation
- Detailed format documentation (.stim, .dem, .uai, .npz)
- Code examples for all use cases
- Troubleshooting and best practices

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit reorganizes the dataset structure to keep UAI files
separate from DEM files for better organization.

Changes:
- Move .uai files from datasets/dems/ to datasets/uais/
- Update generate_uai_from_circuit() to save in datasets/uais/
- Update documentation to reflect new folder structure
- Update datasets/README.md with dataset organization section
- Update examples/GETTING_STARTED.md with correct paths

Dataset structure:
- datasets/circuits/ - Circuit files (.stim)
- datasets/dems/ - Detector error models (.dem)
- datasets/uais/ - UAI format files (.uai)
- datasets/syndromes/ - Syndrome databases (.npz)

All tests passing (62/62)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@GiggleLiu
Copy link
Member

GiggleLiu commented Jan 19, 2026

Looks great! please resolve the conflicts in this pr.

Copy link
Member

@GiggleLiu GiggleLiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good try, impressive speed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of using markdown, please setup a documentation, and deploy it to GitHub pages.

ChanceSiyuan and others added 6 commits January 20, 2026 13:57
- Move generate_demo_dataset.py to examples/
- Move validate_dataset.py to examples/
- Update GETTING_STARTED.md with clarifications

This keeps the root directory clean and groups all example/demo
code in a dedicated folder for better project organization.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Resolved conflicts and integrated changes from main:
- Updated codecov configuration with token and files parameter
- Added torch dependency to pyproject.toml
- Updated default output path to datasets/noisy_circuits
- Preserved DEM and syndrome generation features from this branch
- Removed .claude/settings.local.json from version control
- Added .claude/settings.local.json to .gitignore
- Added WIP note to README

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Move script files from examples/ to scripts/:
- generate_demo_dataset.py
- validate_dataset.py

This separates demonstration scripts from API usage examples.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add mkdocs.yml configuration with Material theme
- Create docs/index.md as main documentation page
- Move GETTING_STARTED.md to docs/getting_started.md
- Add GitHub Actions workflow for automatic deployment
- Add docs dependencies to pyproject.toml
- Add Makefile targets for building and serving docs

Documentation will be available at GitHub Pages after merge to main.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@GiggleLiu
Copy link
Member

All review comments have been addressed:

  1. Claude files removed - Removed .claude/settings.local.json from version control and added to .gitignore (commit 1458f69)

  2. Scripts organized - Moved generate_demo_dataset.py and validate_dataset.py to dedicated scripts/ directory (commit d1dc69f)

  3. Documentation with GitHub Pages - Set up MkDocs with Material theme and automated deployment workflow (commit 8aa716d)

    • Documentation will be deployed to GitHub Pages automatically when merged to main
    • Local preview available with make docs-serve
    • Proper navigation structure with getting started, usage guide, API reference, and mathematical description
  4. Conflicts resolved - Successfully merged with main branch, all conflicts resolved (commit 1458f69)

Ready for final review!

@GiggleLiu GiggleLiu merged commit ad2d771 into main Jan 20, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants