-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Problem Summary
The current BP decoder implementation shows only 2% improvement over a simple syndrome-parity baseline decoder, which is far below expected performance based on published research. After analyzing the code and comparing against recent literature, I've identified fundamental architectural issues that prevent the decoder from working properly.
Expected vs. Actual Performance
What We Should See (from Literature)
Based on recent 2024 research on BP decoders for surface codes:
- Error thresholds: 14-17% for modern BP variants (Improved BP Decoding, arXiv:2407.11523)
- Planar surface code threshold: ~10% for standard BP (Surface Code Error Correction)
- Circuit-level noise threshold: ~1% for MWPM, higher for BP (PyMatching decoder)
- At p=0.03 (3% physical error rate): Should achieve logical error rate well below 3% for d=3+ codes
What We Actually See
At p=0.03 (3% physical error):
| Decoder | LER | Precision | Recall | F1 |
|---|---|---|---|---|
| Syndrome-parity baseline | 50.6% | 35.6% | 50.8% | 0.418 |
| BP Decoder | 49.6% | 36.1% | 50.3% | 0.421 |
| Improvement | 2.0% | - | - | - |
This is broken! At 3% physical error, we're getting ~50% logical error rate, which means the decoder is barely better than random guessing.
Root Cause Analysis
I've identified three fundamental architectural issues:
Issue 1: Wrong Factor Graph Structure
Current UAI Model (from dem_to_uai()):
# Lines 174-182 in dem.py
n_detectors = dem.num_detectors # 24 for d=3
lines.append(str(n_detectors)) # Variables = DETECTORS
lines.append(str(len(errors))) # Factors = ERROR MECHANISMSThis creates:
- Variables = Detectors (binary: fired=1, not fired=0)
- Factors = Error mechanisms (one factor per DEM error)
The Problem:
- This models P(detectors | errors), the forward model
- For decoding, we need P(errors | syndrome), the inverse model
- Setting detectors as "evidence" makes no sense—we already KNOW the detector states from the syndrome!
Issue 2: Inference on the Wrong Quantity
What the code does:
# examples/tanner_graph_walkthrough.py, line 438
evidence = {det_idx + 1: int(syndrome[det_idx]) for det_idx in range(len(syndrome))}
bp_ev = apply_evidence(bp, evidence)
state, info = belief_propagate(bp_ev, ...)
marginals = compute_marginals(state, bp_ev) # Marginals over DETECTORS!The Problem:
- After setting evidence, BP computes marginals over detector variables
- But we already know the detector values (from the syndrome)!
- What we don't know is which errors occurred
- The marginals over detectors are useless for decoding
Issue 3: Decoding Logic Outside BP
Current approach:
# Lines 457-499 in examples/tanner_graph_walkthrough.py
# Build error likelihood scores based on syndrome
for det_idx in syndrome_active:
errors_for_detector = np.where(H[det_idx, :] == 1)[0]
error_likelihoods[errors_for_detector] *= 2.0 # Ad-hoc boost
# Apply parity heuristic
if syndrome_weight % 2 == 1:
flip_likelihood_avg *= 3.0 # Ad-hoc multiplierThe Problem:
- The "decoding" happens outside BP using ad-hoc heuristics
- This completely bypasses the power of belief propagation!
- The 2% improvement comes from these heuristics, not from BP
- It's essentially a fancy wrapper around syndrome-parity decoding
What a Proper BP Decoder Should Do
A correct BP decoder for surface codes should:
1. Correct Factor Graph
Variables = Error mechanisms (binary: occurred=1, didn't occur=0)
Factors = Parity check constraints (one per detector)
Evidence = Observed syndrome s
2. Proper Inference
- Enforce syndrome constraints:
s_i = ⊕_{j: H[i,j]=1} e_j(parity checks) - Run BP to compute marginal probabilities over errors
- Decode by finding most likely error configuration
3. Maximum Likelihood Decoding
error_config = argmax P(e | s)
= argmax P(s | e) · P(e) [Bayes' rule]
= argmax [∏_i δ(s_i = ⊕_j H[i,j]e_j)] · [∏_j p_j^{e_j}(1-p_j)^{1-e_j}]
Proposed Solutions
Option 1: Build Correct Factor Graph
Rewrite dem_to_uai() to create:
- Variables: Error mechanisms (286 binary variables for d=3, r=3)
- Factors: Parity constraints (24 factors, one per detector)
- Each factor enforces:
detector_i = XOR(errors affecting detector i)
Option 2: Use Existing Tools
Consider using proven implementations:
- PyMatching (GitHub) - MWPM decoder, ~1% threshold
- LDPC (bp_osd) - BP+OSD implementation
- Stim built-in decoders for benchmarking
Option 3: Implement Modern BP Variants
From the 2024 literature:
- EWAInit-BP: Adaptive prior probability updates
- MBP: Momentum-based belief propagation
- BlockBP: Outperforms MWPM by >10x in some regimes
Impact
This affects:
- ❌ Tanner Graph Walkthrough tutorial (PR Add Tanner Graph Walkthrough Tutorial with Logical Error Rate Analysis #37) - shows misleading "2% improvement"
- ❌ Any users trying to use
pytorch_bpmodule for actual decoding - ❌ Credibility of the package - claims "BP decoding" but doesn't actually work
References
- Improved Belief Propagation Decoding on Surface Codes (2024)
- Quantum Error Correction Below the Surface Code Threshold
- Error Correction Zoo: Surface Code
- PyMatching: MWPM Decoder
- BP-OSD Decoder
Next Steps
I propose:
- Document this issue (this issue)
- Add warning to current tutorial about performance limitations
- Design correct factor graph structure
- Implement proper BP decoder
- Benchmark against PyMatching to validate
Should I proceed with designing the corrected architecture?
Labels: bug, enhancement, research, decoder
Priority: High - affects core functionality