diff --git a/note/bp.typ b/note/bp.typ
new file mode 100644
index 0000000..7ee19b1
--- /dev/null
+++ b/note/bp.typ
@@ -0,0 +1,695 @@
+#set page(
+  paper: "a4",
+  margin: (x: 2cm, y: 2cm),
+)
+
+#set text(
+  font: ("Source Han Serif SC", "Linux Libertine"),
+  size: 11pt,
+  lang: "en"
+)
+
+#set par(
+  justify: true,
+  leading: 0.8em,
+  first-line-indent: 2em
+)
+
+#show heading: it => [
+  #v(0.5em)
+  #text(weight: "bold", size: 1.15em, it)
+  #v(0.5em)
+]
+
+// Helper macros for common quantum mechanics notation
+#let ket(x) = $|#x >$
+#let bra(x) = $> #x|$
+#let braket(x, y) = $> #x | #y >$
+
+#import "@preview/qec-thrust:0.1.1": *
+
+#let note(title, body) = {
+  block(
+    fill: luma(90%),
+    stroke: (left: 4pt + orange),
+    inset: 12pt,
+    radius: (right: 4pt),
+    width: 100%,
+    [
+      #text(weight: "bold", fill: orange.darken(20%), size: 1.1em)[#title]
+      #v(0.5em)
+      #body
+    ]
+  )
+}
+
+#align(center)[
+  #text(size: 2em, weight: "bold")[
+    Quantum Error Correction and BP Decoding Notes
+  ]
+
+  #v(0.8em)
+
+  #text(size: 1.1em)[
+    Shen Yang
+  ]
+
+  #text(size: 1em, fill: gray.darken(30%))[
+    School of Electronic Science and Engineering, Southeast University
+  ]
+]
+
+#v(2em)
+#outline(
+  title: [Contents],   // ToC title
+  depth: 3,        // depth (1~6)
+  indent: 1.5em      // child indent
+)
+
+#pagebreak()
+
+
+= Part I: Fundamentals of Quantum Coding
+
+== 1. Why do we need quantum coding?
+
+The power of quantum computers comes from superposition and entanglement, but that also makes them extremely fragile. Unlike #link(<quantum-storage>)[#text(fill:red)[classical computers]], quantum systems face two core challenges: <quantum-storage-back>
+
++ *Decoherence and environmental noise*: quantum states interact uncontrollably with the environment, causing phase flips (Phase Flip, Z error) or bit flips (Bit Flip, X error).
++ *No-Cloning Theorem*: we cannot simply copy quantum states like classical error-correcting codes (e.g., repetition code $0 arrow 000$) to do majority voting.
+
+Therefore, the core idea of quantum error correction (QEC) is: *hide quantum information inside an entangled subspace of a many-body system*. We do not copy states; instead, we spread the information of a single logical qubit across many physical qubits via **nonlocal correlations**.
+
+== 2. Encoding principle: stabilizer formalism
+
+Before introducing specific codes (e.g., LDPC and surface codes), we must understand **stabilizers**, which are the basis for BP (belief propagation) decoding.
+
+For $n$ physical qubits, the state space is a $2^n$-dimensional Hilbert space. We define a logical subspace (code space) $cal(C)$ such that all logical states $| psi_L > in cal(C)$ satisfy a set of eigenvalue equations:
+
+$
+  S_i | psi_L > = +1 | psi_L >, quad forall i
+$
+
+These operators $S_i$ are called **stabilizers**.
+- *No error*: measuring $S_i$ always yields $+1$ (or binary 0).
+- *With error*: if an error $E$ occurs and $E$ anticommutes with $S_i$ ($E S_i = - S_i E$), measuring $S_i$ yields $-1$ (binary 1).
+
+By detecting which stabilizers "fire" (value -1), we can infer the error without directly measuring the quantum state (thus avoiding destroying superposition).
+
+== 3. Quantum low-density parity-check codes (QLDPC)
+
+Quantum LDPC codes generalize classical LDPC codes to the quantum domain. Their sparsity makes graph-based decoding (like BP) possible.
+
+=== 3.1 CSS construction (Calderbank-Shor-Steane)
+To handle quantum errors (X and Z), the most common construction is the CSS code. It separates correction into two independent parts. We need two classical parity-check matrices $H_X$ and $H_Z$.
+
+- $H_X$: detects Z errors (rows represent X-type stabilizers).
+- $H_Z$: detects X errors (rows represent Z-type stabilizers).
+
+*Constraint*: to #link(<Commute>)[#text(fill:red)[make stabilizers commute]] (commute), these matrices must satisfy the orthogonality constraint: <Commute-back>
+$
+  H_X dot H_Z^T = 0 mod 2
+$
+
+=== 3.2 Factor-graph representation (Tanner graph)
+LDPC codes can be represented by factor graphs. For BP, there are two types of nodes:
+- *Variable nodes*: physical data qubits.
+- *Check nodes*: stabilizers (X checks or Z checks).
+
+If $H_X$ and $H_Z^T$ are sparse (each row and column has O(1) ones) and satisfy
+$
+  H_X dot H_Z^T = 0 mod 2
+$
+then the stabilizer code is called a quantum LDPC (QLDPC) code.
+
+Below is a typical Tanner graph example. You can see the "locality": each check node connects to only a few variable nodes.
+
+#figure(
+  caption: [Tanner graph (factor graph) representation of a quantum error-correcting code],
+  kind: "image",
+  supplement: "Figure",
+  block(
+    height: 180pt,
+    width: 100%,
+    stroke: 0.5pt + gray.lighten(60%),
+    radius: 5pt,
+    inset: 10pt,
+    align(center + horizon)[
+      #box(width: 300pt, height: 140pt)[
+        // Coordinate parameters
+        #let vy = 20pt   // variable node y
+        #let cy = 120pt  // check node y
+        
+        // Variable node x
+        #let v1x = 50pt
+        #let v2x = 110pt
+        #let v3x = 170pt
+        #let v4x = 230pt
+        #let v5x = 290pt
+        
+        // Check node x
+        #let c1x = 80pt
+        #let c2x = 170pt
+        #let c3x = 260pt
+
+        // 1. Edges (H-matrix ones)
+        // Bottom layer to avoid covering nodes
+        
+        // Check 1 connects v1, v2, v3
+        #place(line(start: (c1x, cy), end: (v1x, vy), stroke: 1.5pt + gray))
+        #place(line(start: (c1x, cy), end: (v2x, vy), stroke: 1.5pt + gray))
+        #place(line(start: (c1x, cy), end: (v3x, vy), stroke: 1.5pt + gray))
+        
+        // Check 2 connects v2, v4
+        #place(line(start: (c2x, cy), end: (v2x, vy), stroke: 1.5pt + gray))
+        #place(line(start: (c2x, cy), end: (v4x, vy), stroke: 1.5pt + gray))
+
+        // Check 3 connects v3, v4, v5
+        #place(line(start: (c3x, cy), end: (v3x, vy), stroke: 1.5pt + gray))
+        #place(line(start: (c3x, cy), end: (v4x, vy), stroke: 1.5pt + gray))
+        #place(line(start: (c3x, cy), end: (v5x, vy), stroke: 1.5pt + gray))
+
+        // 2. Variable nodes (physical qubits)
+        #let draw_vnode(x, label) = {
+          place(dx: x - 10pt, dy: vy - 10pt, circle(radius: 10pt, fill: white, stroke: 1.5pt + black))
+          place(dx: x - 5pt, dy: vy - 25pt, text(size: 10pt)[#label])
+        }
+        
+        #draw_vnode(v1x, $d_1$)
+        #draw_vnode(v2x, $d_2$)
+        #draw_vnode(v3x, $d_3$)
+        #draw_vnode(v4x, $d_4$)
+        #draw_vnode(v5x, $d_5$)
+
+        // 3. Check nodes (stabilizers)
+        #let draw_cnode(x, label) = {
+          place(dx: x - 10pt, dy: cy - 10pt, rect(width: 20pt, height: 20pt, fill: luma(230), stroke: 1.5pt + black))
+          place(dx: x - 5pt, dy: cy + 15pt, text(size: 10pt, weight: "bold")[#label])
+        }
+
+        #draw_cnode(c1x, $S_1$)
+        #draw_cnode(c2x, $S_2$)
+        #draw_cnode(c3x, $S_3$)
+
+        // 4. Labels
+        #place(dx: 0pt, dy: vy - 5pt, text(size: 8pt, fill: gray, style: "italic")[Data qubits])
+        #place(dx: 0pt, dy: cy + 5pt, text(size: 8pt, fill: gray, style: "italic")[Check operators])
+      ]
+    ]
+  )
+)
+
+=== Diagram notes
+- **Circle nodes ($d_i$)**: physical data qubits.
+- **Square nodes ($S_j$)**: stabilizer check operators.
+- **Edges**: if the $j$-th row and $i$-th column of $H$ is 1 (stabilizer $S_j$ involves qubit $d_i$), draw a line.
+
+In the upcoming BP algorithm, information (probabilities/confidence) will pass along these **black edges** between variable and check nodes until convergence or the max iteration count.
+
+== 4. Surface code
+
+The surface code is a special QLDPC code defined on a 2D lattice. It is one of the most promising candidates for fault-tolerant quantum computing because its check operators are **local** (only nearby qubits).
+
+=== 4.1 Geometry
+We define physical qubits and stabilizers on a 2D grid.
+- *Data qubits*: on edges (or vertices, depending on definition; here we adopt a Kitaev toric-code style for intuition).
+- *Z stabilizers (plaquettes)*: on faces; detect Z-operator products.
+- *X stabilizers (vertex/star)*: on vertices; detect X-operator products.
+
+=== 4.2 Diagram explanation
+
+Below is a surface-code diagram with code distance $d = 3$.
+
+- **Black dots**: data qubits. There are 9 black dots, corresponding to 9 physical qubits. They store the quantum state and can suffer physical errors. They are not independent; together they encode one logical qubit.
+
+- **Colored squares**: stabilizer operator regions. Each square corresponds to a stabilizer whose support is the four corner data qubits. Different colors distinguish $X$ and $Z$ stabilizers, interleaved to ensure mutual commutation and simultaneous measurement.
+
+- **Half-circles on boundaries**: boundary stabilizers. With open boundaries, boundary stabilizers act on fewer than four data qubits. The half-circle indicates this incomplete structure.
+
+In this structure, logical operators correspond to nontrivial operator chains across the lattice. For $d = 3$, any shortest chain from one boundary to the opposite boundary contains at least 3 data qubits, capturing the meaning of distance $d = 3$.
+
+#figure(
+  caption: [Surface code diagram with code distance $d = 3$],
+  kind: "image",
+  supplement: "Figure",
+)[
+  #canvas({
+    import draw: *
+
+    let d = 3
+
+    surface-code((0, 0), size: 1.5, d, d, name: "sc-d3")
+
+    for i in range(d) {
+      for j in range(d) {
+        content(
+          (rel: (0.3, 0.3), to: "sc-d3" + "-" + str(i) + "-" + str(j)),
+          [#(i * d + j + 1)],
+        )
+      }
+    }
+  })
+]
+
+=== 4.3 Detailed construction and stabilizer definitions
+
+Based on the diagram, we can write the surface code precisely. Its power comes from **topology**: information depends on global lattice topology, not individual qubits.
+
+==== 4.3.1 Physical layer: parity checks
+In the diagram, we saw two types of stabilizers responsible for different error types.
+
+- **Z stabilizer (plaquette operator)**
+  Corresponds to the red square $Z_p$. It acts on the four data qubits around the face ($d_1, d_2, d_3, d_4$).
+  Its definition:
+  $
+    S_p^Z = Z_(d 1) Z_(d 2) Z_(d 3) Z_(d 4)
+  $
+  *Function*: detects **bit-flip errors (X error)**.
+  If $d_1$ has an $X$ error, since $X Z = - Z X$ (anticommute), measuring $S_p^Z$ yields eigenvalue $-1$. We say the plaquette is "excited" (a defect is detected).
+
+- **X stabilizer (vertex operator)**
+  Corresponds to the blue dot $X_v$. It acts on the four data qubits touching the vertex.
+  For example, the top-right vertex $X_(v 2)$ connects $d_1, d_4, d_5$ and an unshown qubit above.
+  Its definition:
+  $
+    S_v^X = product_(i in "star"(v)) X_i
+  $
+  *Function*: detects **phase-flip errors (Z error)**.
+  If a neighboring qubit has a $Z$ error, the vertex measurement becomes $-1$.
+
+==== 4.3.2 Logical layer: logical qubits
+
+If physical qubits are used for checks, where is the "real" information stored?
+Answer: **operator chains that span the entire lattice**.
+
+For a $d × d$ lattice (code distance $d$):
+
+- **Logical $overline(Z)_L$ operator**:  
+  a chain of $Z$ operators from the **top boundary** to the **bottom boundary**.
+  $
+    overline(Z)_L = Z_1 · Z_k · … · Z_m  ("vertical span")
+  $
+  This chain intersects each X stabilizer either 0 or 2 times, so it commutes with all stabilizers.
+
+- **Logical $overline(X)_L$ operator**:  
+  a chain of $X$ operators from the **left boundary** to the **right boundary**.
+  $
+    overline(X)_L = X_a · X_b · … · X_c  ("horizontal span")
+  $
+  This chain commutes with all Z stabilizers (plaquettes).
+
+#note("Key point")[  
+$overline(X)_L$ and $overline(Z)_L$ must
+**intersect at an odd number (typically 1) of physical qubits**.
+Because $X$ and $Z$ anticommute at that location, the logical operators satisfy:
+
+$
+  overline(X)_L · overline(Z)_L
+  = - overline(Z)_L · overline(X)_L
+$
+
+This is the required algebra for a logical qubit.]
+
+==== 4.3.3 Code distance (Code Distance $d$)
+*Definition*: the minimum number of physical qubit operations required to transform one logical state (e.g. $|overline(0) >$) into an orthogonal logical state (e.g. $|overline(1) >$).
+Equivalently, it is the minimum weight of a nontrivial logical operator.
+
+#note("Why does d determine error-correction ability?")[
+We have a golden formula:
+$ d = 2t + 1 $
+where $t$ is the number of errors we can guarantee to correct.
+
+- *Intuition*:
+  Imagine $|overline(0) >$ and $|overline(1) >$ as two points separated by distance $d$.
+  - If $t$ errors occur, the state is pushed $t$ steps away from $|overline(0) >$.
+  - As long as $t < d/2$, we are still closer to $|overline(0) >$ than to $|overline(1) >$. The decoder pulls us back to $|overline(0) >$ by a nearest-neighbor rule.
+  - Once errors reach $d/2$ or more (e.g. $(d+1)/2$), we cross the midpoint and are closer to $|overline(1) >$. The decoder "corrects" toward $|overline(1) >$, causing a logical error.]
+
+#note("Geometric meaning in surface codes")[
+On the surface-code lattice, $d$ is the **linear size** of the grid.
+- To cause a logical flip, an error chain must **span** the entire lattice (left-right or top-bottom).
+- Therefore $d$ is the shortest path length from one boundary to the opposite.
+- Increasing grid size increases $d$, exponentially lowering logical error rate $P_L tilde (p_"phys"/p_"th")^(d/2)$.]
+
+=== 4.4 Error detection
+
+Assume an **$X$ error** occurs at **$d_1$**:
+
+1. **Physical layer**: the state of $d_1$ flips.
+2. **Check layer**:
+   - The **red plaquette ($Z_p$)** containing $d_1$ anticommutes, measurement becomes $-1$.
+   - The **adjacent upper plaquette** (not shown) also becomes $-1$.
+   - All other plaquettes and vertices are unaffected (X errors commute with X stabilizers).
+3. **Syndrome**: we see two adjacent "excited plaquettes" in space.
+4. **Decoder task**: the decoder (e.g., BP) infers that "the most likely explanation is an error on the shared edge (i.e. $d_1$) connecting the two red plaquettes."
+
+This is exactly how BP works: infer errors (causes) from syndromes (results).
+
+= Part II: Belief Propagation (BP) decoding
+
+Belief propagation (BP), also called the sum-product algorithm, is an **iterative message-passing** algorithm on factor graphs.
+
+In QEC, our goal is: given the observed syndrome vector $s$ (which stabilizers fired), compute the marginal probability $P(e_i | s)$ that each physical qubit $i$ has an error $e_i$.
+
+Direct marginalization is exponential (NP-hard). BP approximates this via local message passing. For numerical stability and to convert products to sums, we use the **log-likelihood ratio (LLR)**.
+
+== 1. Core definition: LLR
+
+For each variable (physical qubit), define the LLR as the natural log of the ratio of "no error" to "error":
+
+$
+  L = ln (P(x=0) / P(x=1))
+$
+
+- $L > 0$: tends to believe no error (0).
+- $L < 0$: tends to believe error (1).
+- Larger $|L|$: higher confidence.
+
+== 2. Algorithm flow
+
+BP has two message directions: variable to check $(v arrow c)$ and check to variable $(c arrow v)$.
+
+=== Step 0: Initialization
+
+Based on the physical channel error rate #link(<Prior>)[#text(fill:red)[$p$ (prior probability)]], initialize each data qubit $v_i$'s LLR.
+Assume each qubit has error probability $p$: <Prior-back>
+
+$
+  L_i^((0)) = ln ((1-p)/p)
+$
+This is the algorithm's initial belief.
+
+=== Step 1: Check node update ($c arrow v$)
+
+*Intuition*: check node $c_j$ tells variable $v_i$: "Based on the other variables $v_k$ I connect to, and my syndrome $s_j$ (0 or 1), I think your value should be..."
+
+This is a **parity-check** constraint. If the sum of other errors is even and $s_j=0$, then $v_i$ must be 0 (even), and so on.
+
+Using properties of $tanh$, the message $R_(j arrow i)$ is:
+
+$
+  R_(j arrow i) = 2 tanh^(-1) ( (-1)^(s_j) product_(k in N(j) backslash i) tanh(Q_(k arrow j) / 2) )
+$
+
+- $N(j) backslash i$: all variables connected to check $j$, **excluding** $i$ (avoid feedback).
+- $(-1)^(s_j)$: key term. If $s_j=1$ (check fired), it flips the sign and says "someone among you is wrong."
+- $Q_(k arrow j)$: message from variable $k$ in the previous round.
+
+=== Step 2: Variable node update ($v arrow c$)
+
+*Intuition*: variable node $v_i$ aggregates messages from all neighboring checks and tells check $c_j$: "Combining channel prior and other checks, I believe my state is..."
+
+With LLRs, products become **sums**:
+
+$
+  Q_(i arrow j) = L_i^((0)) + sum_(k in M(i) backslash j) R_(k arrow i)
+$
+
+- $L_i^((0))$: initial channel LLR.
+- $M(i) backslash j$: all checks connected to $i$, **excluding** $j$.
+
+=== Step 3: Decision and termination
+
+After some iterations (alternating steps 1 and 2), or after reaching the max iterations, compute each bit's **total posterior LLR**:
+
+$
+  L_i^("total") = L_i^((0)) + sum_(k in M(i)) R_(k arrow i)
+$
+(Note: this sum includes all neighbors.)
+
+**Hard decision**:
+- If $L_i^("total") > 0 arrow hat(e)_i = 0$ (no error).
+- If $L_i^("total") < 0 arrow hat(e)_i = 1$ (error).
+
+Finally, check whether the inferred error vector $hat(e)$ satisfies the syndrome:
+$
+H · hat(e)^T = s^T,
+$
+which determines whether the current estimate matches the syndrome constraints.
+
+If equal, decoding succeeds; otherwise decoding fails (or proceeds to post-processing like OSD).
+
+== 3. Visual intuition: message passing
+
+To help understand information flow, here is a simple local message-flow diagram.
+
+#figure(
+  caption: [BP message-passing: information exchange diagram],
+  kind: "image",
+  supplement: "Figure",
+  block(
+    height: 140pt,
+    width: 100%,
+    stroke: 0.5pt + gray.lighten(60%),
+    radius: 5pt,
+    inset: 10pt,
+    align(center + horizon)[
+      #box(width: 340pt, height: 120pt)[
+        
+        // Coordinates
+        #let vy = 60pt      // baseline y
+        #let v1x = 60pt     // left variable x
+        #let cx = 170pt     // middle check x
+        #let v2x = 280pt    // right variable x (neighbor)
+        
+        // 1. Structural edges (gray solid)
+        #place(line(start: (v1x, vy), end: (cx, vy), stroke: 2pt + gray.lighten(70%)))
+        #place(line(start: (cx, vy), end: (v2x, vy), stroke: 2pt + gray.lighten(70%)))
+
+        // 2. Nodes
+        // Left variable
+        #place(dx: v1x - 12pt, dy: vy - 12pt, circle(radius: 12pt, fill: white, stroke: 1.5pt + black))
+        #place(dx: v1x - 25pt, dy: vy - 35pt, text(weight: "bold")[$V_i$])
+        
+        // Middle check
+        #place(dx: cx - 12pt, dy: vy - 12pt, rect(width: 24pt, height: 24pt, fill: luma(230), stroke: 1.5pt + black))
+        #place(dx: cx - 5pt, dy: vy - 35pt, text(weight: "bold")[$C_j$])
+
+        // Right variable
+        #place(dx: v2x - 12pt, dy: vy - 12pt, circle(radius: 12pt, fill: white, stroke: 1.5pt + black))
+        #place(dx: v2x - 10pt, dy: vy - 35pt, text(fill: gray)[$V_k$])
+
+        // 3. Message flows
+        
+        // Message Q: variable -> check (blue, above, right)
+        #let q_start = v1x + 15pt
+        #let q_end = cx - 15pt
+        #let q_y = vy - 8pt
+        
+        #place(line(start: (q_start, q_y), end: (q_end, q_y), stroke: (thickness: 1.5pt, paint: blue, dash: "dashed")))
+        // Blue arrow
+        #place(dx: q_end - 2pt, dy: q_y, polygon(fill: blue, (-4pt, 3pt), (-4pt, -3pt), (2pt, 0pt)))
+        #place(dx: 95pt, dy: vy - 25pt, text(size: 9pt, fill: blue)[$Q_(i -> j)$])
+        
+        // Message R: check -> variable (red, below, left)
+        #let r_start = cx - 15pt
+        #let r_end = v1x + 15pt
+        #let r_y = vy + 8pt
+        
+        #place(line(start: (r_start, r_y), end: (r_end, r_y), stroke: (thickness: 1.5pt, paint: red)))
+        // Red arrow
+        #place(dx: r_end + 2pt, dy: r_y, polygon(fill: red, (4pt, 3pt), (4pt, -3pt), (-2pt, 0pt)))
+        #place(dx: 95pt, dy: vy + 15pt, text(size: 9pt, fill: red)[$R_(j -> i)$])
+
+        // Additional input from the right neighbor (gray, left)
+        #let n_start = v2x - 15pt
+        #let n_end = cx + 15pt
+        
+        #place(line(start: (n_start, vy), end: (n_end, vy), stroke: (thickness: 1.5pt, paint: gray, dash: "dotted")))
+        // Gray arrow
+        #place(dx: n_end + 2pt, dy: vy, polygon(fill: gray, (4pt, 3pt), (4pt, -3pt), (-2pt, 0pt)))
+        #place(dx: 215pt, dy: vy - 15pt, text(size: 8pt, fill: gray)[from neighbor $V_k$])
+
+        // 4. Bottom note
+        #place(dx: 120pt, dy: 100pt, 
+          block(stroke: (left: 2pt+red), inset: 5pt, radius: 2pt, fill: luma(250))[
+            #text(size: 9pt)[$R_(j -> i)$ depends on $sum V_k$]
+          ]
+        )
+      ]
+    ]
+  )
+)
+
+This diagram shows BP message passing on a Tanner graph: variable nodes and check nodes exchange "information" along edges to approximate posterior probabilities.
+
+- **Node meanings**:
+  - Circle nodes $V_i, V_k$: variable nodes (a bit or error variable; in QEC, a physical qubit error random variable).
+  - Square nodes $C_j$: check nodes (parity constraints; in quantum codes, stabilizer/syndrome constraints).
+
+- **Gray solid structural edges**: Tanner graph edges (defined by ones in $H$). BP messages travel only along these edges.
+
+- **Blue dashed message $Q_{(i→j)}$ (variable → check)**:  
+  the belief/probability message from $V_i$ to $C_j$.  
+  Key: it is a **local estimate of $V_i$**, excluding feedback from the same edge (do not use the message just received from $C_j$). Thus:
+  $Q_{i→j}$ equals $V_i$'s prior (channel info) plus messages from other checks (excluding $C_j$).
+
+- **Red solid message $R_{(j→i)}$ (check → variable)**:  
+  the constraint feedback from $C_j$ to $V_i$ based on the parity condition.  
+  It depends on other variables connected to $C_j$ (shown as "from neighbor $V_k$"), so:
+  - $R_{j→i}$ aggregates all $Q_{k→j}$ (for $k≠i$),
+  - combines with the syndrome/parity,
+  - gives a consistency constraint on $V_i$.
+
+- **Gray dotted line (from neighbor $V_k$)**:  
+  emphasizes that $C_j$ needs inputs from other neighbors when computing $R_{j→i}$.
+
+- **Overall iteration**:  
+  BP iteratively updates $Q$ and $R$ so each variable node refines its posterior estimate. When messages converge or max iterations is reached, we make final decisions (hard/soft).
+
+(Note: in CSS/QLDPC decoding, the same message-passing structure runs on the factor graphs of $H_X$ and $H_Z$, with syndromes as constraints.)
+
+
+== 4. The quantum BP dilemma: degeneracy
+
+While BP performs well for classical LDPC codes, in quantum error correction it faces a major issue: **degeneracy**.
+
+In classical decoding, we must find the exact error. In quantum decoding, many different errors can correspond to the same logical state (e.g., $E$ and $E' = S_i E$ are equivalent).
+
+BP tends to find the "most likely specific error" rather than the "most likely error class." This can cause hesitation or non-convergence at low signal-to-noise ratios. That is why we need **OSD (Ordered Statistics Decoding)** as a post-processing step to break the tie.
+
+= Part III: Ordered Statistics Decoding (OSD)
+
+BP outputs a "probability distribution" (soft information). In the following cases, BP alone is insufficient:
+1.  **Non-convergence/oscillation**: with many short cycles, BP may loop and never settle.
+2.  **Invalid syndrome**: a hard decision ($L>0 arrow 0$) can yield an error vector $e$ that does not satisfy $H e^T = s^T$.
+
+**OSD (Ordered Statistics Decoding)** is a post-processing algorithm combining soft decisions with linear algebra. Core idea: **rather than guessing blindly, trust the "most confident" bits and solve for the rest.**
+
+== 1. Core steps
+
+Assume BP outputs the final LLR vector $L = (L_1, L_2, ..., L_n)$.
+
+=== Step 1: Ordering
+Sort all physical bits by absolute LLR magnitude $|L_i|$ in **descending** order.
+- Larger $|L_i|$ means BP is more confident (either error or no error).
+- Smaller $|L_i|$ (near 0) means ambiguous and least reliable.
+
+Reorder the columns of $H$ accordingly to get a new matrix $H'$:
+$
+  H' = [ H_A | H_B ]
+$
+- $H_A$: columns for **high-confidence** bits.
+- $H_B$: columns for **low-confidence** bits.
+
+=== Step 2: Gaussian elimination and basis selection
+We want an error vector $e$ that satisfies the syndrome $s$:
+$H e^T = s^T$.
+
+Since $H$ is usually wide (more variables than equations), there are infinitely many solutions. OSD's strategy: **let the high-confidence bits decide first.**
+
+We perform **Gaussian elimination** on $H'$ and select linearly independent columns in $H_A$ as an **information set** (basis).
+In practice, we try to reduce $H'$ to near-identity form:
+$
+  tilde(H) = [ I | P ]
+$
+This often involves further column swaps to ensure basis vectors come from the most confident positions.
+
+=== Step 3: Solving
+Once the basis is chosen (the most reliable positions), we "lock" those values (from BP hard decisions), and solve for the remaining positions uniquely so that $H e^T = s$ holds exactly.
+
+== 2. OSD order (OSD-0 vs OSD-k)
+
+The above is **OSD-0** (order-0), a greedy algorithm that fully trusts the ordering.
+But BP confidence can be wrong.
+
+So we introduce **OSD-k**:
+- **Idea**: before solving, also try flipping the "least reliable among the basis" bits.
+- **Process**:
+  1. Choose the top $k$ bits that are in the basis but have the lowest confidence among the basis.
+  2. Enumerate all $2^k$ flip combinations.
+  3. For each combination, solve for $e$ and compute its posterior weight.
+  4. Choose the solution with minimum weight (maximum probability).
+
+- *Cost*: complexity grows as $2^k$. Usually small $k$ (0, 1, 2) already improves performance significantly.
+
+== 3. Diagram: from BP to OSD
+
+Below is a diagram showing how OSD uses BP output.
+
+#figure(
+  caption: [OSD workflow diagram],
+  kind: "image",
+  supplement: "Figure",
+  block(
+    height: 180pt,
+    width: 100%,
+    stroke: 0.5pt + gray.lighten(60%),
+    radius: 5pt,
+    inset: 10pt,
+    align(center + horizon)[
+      #box(width: 360pt, height: 160pt)[
+        
+        // 1. BP output
+        #place(dx: 20pt, dy: 10pt, text(weight: "bold")[1. BP output LLR])
+        // Draw bars for LLR magnitude
+        #let bar(x, h, col, label) = {
+          place(dx: x, dy: 60pt - h, rect(width: 15pt, height: h, fill: col))
+          place(dx: x + 2pt, dy: 65pt, text(size: 8pt)[#label])
+        }
+        
+        // Bits 1, 2, 3, 4, 5
+        #bar(20pt, 40pt, blue.lighten(30%), "d1") // high confidence
+        #bar(40pt, 10pt, gray.lighten(50%), "d2") // low confidence
+        #bar(60pt, 35pt, blue.lighten(30%), "d3")
+        #bar(80pt, 5pt,  gray.lighten(50%), "d4") // very low
+        #bar(100pt, 45pt, blue.lighten(30%), "d5")
+        
+        #place(dx: 20pt, dy: 80pt, text(size: 9pt, style: "italic")[Long bars mean "confidence"])
+
+        // Arrow to the right
+        #place(dx: 130pt, dy: 50pt, text(size: 15pt)[$arrow.r$])
+
+        // 2. Ordering and basis selection
+        #place(dx: 160pt, dy: 10pt, text(weight: "bold")[2. Ordering and basis selection])
+        
+        // Rearranged matrix
+        #place(dx: 160pt, dy: 30pt, [
+          #rect(width: 80pt, height: 60pt, fill: white, stroke: 1pt + black)[
+            #grid(
+              columns: (1fr, 1fr),
+              align: center + horizon,
+              rect(width: 100%, height: 100%, fill: green.lighten(80%), stroke: none)[
+                #text(size: 8pt)[$H_("reliable")$ \ (basis)]
+              ],
+              rect(width: 100%, height: 100%, fill: red.lighten(80%), stroke: none)[
+                #text(size: 8pt)[$H_("others")$ \ (rest)]
+              ]
+            )
+          ]
+        ])
+        #place(dx: 160pt, dy: 95pt, text(size: 8pt)[d5, d1, d3 ... d2, d4])
+        
+        // Arrow to the right
+        #place(dx: 250pt, dy: 50pt, text(size: 15pt)[$arrow.r$])
+
+        // 3. Linear solve
+        #place(dx: 280pt, dy: 10pt, text(weight: "bold")[3. Linear solve])
+        
+        #place(dx: 280pt, dy: 40pt, block(width: 80pt)[
+          #text(size: 10pt)[
+            Solve: \
+            $H_"basis" dot e_"basis" = s$
+          ]
+        ])
+        
+        #place(dx: 280pt, dy: 85pt, 
+          rect(fill: luma(240), inset: 5pt, radius: 3pt, stroke: 0.5pt + black)[
+            #text(size: 9pt, weight: "bold")[Get valid solution e]
+          ]
+        )
+      ]
+    ]
+  )
+)
+
+== 4. Summary: BP+OSD
+
+In QLDPC research, **BP+OSD** has become the de facto standard decoder configuration.
+
+1.  **BP** handles sparse graph structure, using probabilistic info to converge near the true error.
+2.  **OSD** uses linear algebra to enforce valid solutions and resolve degeneracy.
+
+This combination keeps BP's low complexity ($O(n)$ to $O(n log n)$) while greatly improving the logical error threshold, approaching maximum-likelihood decoding (MLD).
+
+#include"bp_note_trans.typ"
diff --git a/note/bp_note.typ b/note/bp_note.typ
new file mode 100644
index 0000000..e6eed15
--- /dev/null
+++ b/note/bp_note.typ
@@ -0,0 +1,273 @@
+#set page(
+  paper: "a4",
+  margin: (x: 2cm, y: 2cm),
+)
+
+#set text(
+  font: ("Source Han Serif SC", "Linux Libertine"),
+  size: 11pt,
+  lang: "en"
+)
+
+#set par(
+  justify: true,
+  leading: 0.8em
+)
+
+#let note(title, body) = {
+  block(
+    fill: luma(90%),
+    stroke: (left: 4pt + orange),
+    inset: 12pt,
+    radius: (right: 4pt),
+    width: 100%,
+    [
+      #text(weight: "bold", fill: orange.darken(20%), size: 1.1em)[#title]
+      #v(0.5em)
+      #body
+    ]
+  )
+}
+= Note
+
+== 1. Storing quantum states on a classical computer <quantum-storage> 
+
+Classical bits are definite like `10101`, but a quantum state contains superposition and entanglement. There are two main ways to store it: a "general brute-force" method and a "QEC-optimized efficient" method.
+
+=== 1.1 General method: state vector (State Vector) -- exponential blowup
+
+This is the most intuitive but extremely expensive approach.
+For $n$ qubits, the system can be in a superposition of all $2^n$ basis states:
+
+$
+  |psi > = alpha_0 |00...0 > + alpha_1 |00...1 > + dots + alpha_(2^n - 1) |11...1 >
+$
+
+In classical memory, we must store **every** $alpha_i$.
+- Each $alpha_i$ is complex (usually 2 64-bit floats, i.e. 16 bytes).
+- *Required memory*: $16 times 2^n$ bytes.
+
+#table(
+  columns: (1fr, 1fr, 2fr),
+  inset: 8pt,
+  align: horizon,
+  stroke: none,
+  table.header([*Number of qubits ($n$)*], [*Number of complex values*], [*Required memory*]),
+  table.hline(stroke: 0.5pt),
+  [10], [1,024], [16 KB (negligible)],
+  [30], [~1 billion], [16 GB (high-end PC limit)],
+  [50], [~1,125 trillion], [16 PB (supercomputer limit)],
+  [100], [$1.2 times 10^30$], [More than the number of atoms on Earth]
+)
+
+#note("Conclusion")[
+  This method can only simulate small circuits below 40-50 qubits and cannot simulate surface codes with thousands of qubits.
+]
+
+=== 1.2 QEC-specific method: stabilizer tableau (Stabilizer Tableau) -- efficient storage
+
+In quantum error correction research (especially with Pauli errors and Clifford gates), we use the **Gottesman-Knill theorem**.
+
+If the circuit contains only Clifford gates (H, CNOT, S, Pauli), we **do not need** to store wavefunction amplitudes; we only need to track **the evolution of stabilizer operators**.
+
+=== 1.3 Data structure: binary matrix (Tableau)
+We only need to store an $n times 2n$ **binary matrix** (0 and 1).
+- No complex numbers.
+- Each row is a stabilizer generator.
+- Each column indicates whether the stabilizer acts as $X$ or $Z$ on qubit $i$.
+
+Assume there are $n$ qubits. We use two binary bits $(x_{i}, z_{i})$ to represent the Pauli operator on qubit $i$:
+- $I arrow (0, 0)$
+- $X arrow (1, 0)$
+- $Z arrow (0, 1)$
+- $Y arrow (1, 1)$ (because $Y = i X Z$)
+
+#note("Binary representation of Pauli operators")[
+In the Pauli frame, a single-qubit Pauli operator can be written as  
+$X^(x_i) Z^(z_i)$, where $x_i, z_i ∈ {0, 1}$.
+
+The mapping is:
+- $I → (0, 0)$
+- $X → (1, 0)$
+- $Z → (0, 1)$
+- $Y → (1, 1)$ (because $Y = i · X · Z$, ignoring global phase)
+
+Therefore, a stabilizer operator on $n$ qubits  
+$S = P_1 ? P_2 ? ? ? P_n$ 
+can be represented by a length-$2n$ binary vector:
+$(x_1, …, x_n ∥ z_1, …, z_n)$.
+]
+#note("Meaning of tableau dimensions")[
+
+- **Rows (about $n$)**: number of stabilizer generators.
+- **Columns ($2n$)**: for each stabilizer,  
+  - $n$ $X$ components,  
+  - $n$ $Z$ components.
+
+So the tableau is an $n × 2n$ binary matrix overall.]
+
+=== 1.4 Example
+Let $n=2$, with stabilizers $X_1 X_2$ and $Z_1 Z_2$.
+The stored tableau looks like:
+
+$
+  mat(
+    delim: "[",
+    "x1", "x2", "|", "z1", "z2", "r (phase)";
+    1, 1, "|", 0, 0, 0;
+    0, 0, "|", 1, 1, 0
+  )
+$
+- First row $(1,1,0,0) arrow X_1 X_2$
+- Second row $(0,0,1,1) arrow Z_1 Z_2$
+
+=== 1.5 Advantages
+- *Memory complexity*: $O(n^2)$. Even for $n=10,000$, only a few MB of binary matrix storage.
+- *Compute speed*: bitwise operations (XOR) are extremely fast.
+
+=== 1.6 Summary: which storage is used in BP decoding?
+
+In LDPC and surface-code decoding (BP+OSD), we typically assume a Pauli channel error model (only X, Y, Z errors, no small rotations).
+
+Therefore simulators (e.g. Python's `stim` library) use *Method 2 (stabilizer tableau)*.
+
+1. **Storage**: do not store $|psi >$, only the current stabilizer generator matrix.
+2. **Errors**: just flip binary bits in the tableau (0 to 1).
+3. **Syndrome**: computed by matrix multiplication $H dot e^T$, giving a classical binary string `10101...`.
+4. **BP input**: BP receives that classical `10101` syndrome and infers the underlying errors.
+
+#link(<quantum-storage-back>)[#text(fill:blue)[*BACK?*]]
+
+== 2. Why must stabilizers commute? (Commute)<Commute>
+  This is the physical foundation for QEC to work, for two main reasons:
+
+=== 2.1 Possibility of simultaneous measurement
+In quantum mechanics, the Heisenberg uncertainty principle tells us:
+*Only when two operators $A$ and $B$ commute (i.e. $[A, B] = "AB" - "BA" = 0$) can we simultaneously determine their measurement outcomes.*
+
+- **If they do not commute**: measuring $S_1$ disturbs $S_2$.
+  For example, if we first measure $S_1$ and get $+1$, then measure $S_2$, $S_1$ might flip back to $-1$ (or the system collapses to a state that is not an eigenstate of $S_1$).
+- **In QEC**: we need to extract all syndromes at once to diagnose errors. If stabilizers "fight", we cannot obtain a stable, consistent syndrome. The measurement itself would create new errors.
+
+=== 2.2 Existence of common eigenstates
+The logical space (code space) is defined as the $+1$ common eigenspace of all stabilizers:
+$
+  S_i |psi_L > = +1 |psi_L >, quad forall i
+$
+Linear algebra tells us that a set of operators has a common eigenbasis only if they mutually commute.
+If they do not commute, there is no state that satisfies all $S_i$ constraints, and the code space cannot be defined.
+
+=== 2.3 Mathematical derivation: why does $H_X dot H_Z^T = 0$ imply commutation?
+
+This comes from Pauli algebra:
+- On the same qubit: $X Z = - Z X$ (anticommute, sign -1).
+- On different qubits: $X_i Z_j = Z_j X_i$ (commute).
+
+Suppose an X stabilizer $S_x$ and a Z stabilizer $S_z$ act on $n$ qubits. Whether they commute depends on how many positions they "collide" (both non-identity).
+$
+  S_x S_z = (-1)^k S_z S_x
+$
+where $k$ is the number of qubits where $S_x$ applies $X$ and $S_z$ applies $Z$ (overlap count).
+
+- For $S_x S_z = S_z S_x$, we need $(-1)^k = 1$, i.e. **$k$ must be even**.
+- In binary matrix multiplication, the dot product of a row of $H_X$ with a column of $H_Z^T$ computes exactly this overlap count $k$ (mod 2).
+
+Therefore, requiring $H_X dot H_Z^T = 0 mod 2$ means every X-check and Z-check overlap an even number of times, guaranteeing physical commutation.
+
+#note("Commutation check")[
+
+Below is a minimal CSS example to verify: when $H_X · H_Z^T = 0$ (mod 2), the corresponding X-stabilizer and Z-stabilizer do commute.
+
+Example setup ($n = 4$)
+
+Take one X-check row vector and one Z-check row vector:
+
+$h_x = [1, 1, 0, 0]$  
+$h_z = [1, 1, 1, 1]$
+
+Their Pauli stabilizers are:
+
+$S_x = X_1 X_2$  
+$S_z = Z_1 Z_2 Z_3 Z_4$
+
+Here "1" means the Pauli acts at that position ($X$ for $S_x$, $Z$ for $S_z$),  
+"0" means identity $I$.
+
+Dot product: $h_x · h_z^T$
+
+Dot product (mod 2):
+
+$h_x · h_z^T
+= (1·1 + 1·1 + 0·1 + 0·1) mod 2
+= (1 + 1 + 0 + 0) mod 2
+= 2 mod 2
+= 0$
+
+So this pair satisfies $H_X · H_Z^T = 0$ (for this row/column pair).
+
+Count of "collisions" $k$
+
+A "collision" means $X$ (from $S_x$) and $Z$ (from $S_z$) act on the same qubit.  
+In this example:
+
+- Qubit 1: $X_1$ and $Z_1$ collide (1 time)
+- Qubit 2: $X_2$ and $Z_2$ collide (1 more time)
+- Qubits 3,4: $S_x$ is $I$, no collision
+
+So the collision count is $k = 2$ (even).
+
+Verify commutation with Pauli algebra
+
+Using $X Z = - Z X$ on the same qubit and commutation on different qubits:
+
+$S_x S_z
+= (X_1 X_2)(Z_1 Z_2 Z_3 Z_4)$
+
+When swapping to group same-qubit terms, each swap $X_i$ with $Z_i$ introduces a $-1$:
+
+- Qubit 1: $X_1 Z_1 = - Z_1 X_1$ contributes $-1$
+- Qubit 2: $X_2 Z_2 = - Z_2 X_2$ contributes another $-1$
+
+Thus the total sign is $(-1)^k = (-1)^2 = +1$, so:
+
+$S_x S_z = (+1) S_z S_x = S_z S_x$
+
+So they **commute**.
+]
+
+#link(<Commute-back>)[#text(fill:blue)[*BACK?*]] 
+
+== 3. Scientific basis and sources of the prior (Prior) <Prior>
+
+*"Before we even compute, why are we allowed to assume every bit has a fixed error probability $p$? Is that scientific?"* In Bayesian statistics, this is called a **prior**. In quantum error correction, introducing it is not only scientific, but necessary, for the following reasons:
+
+=== 3.1 Physical source: obtained through hardware calibration
+This $p$ is not a number we make up during decoding; it comes from **measured experimental data**.
+- Before a quantum chip is used, experimentalists run benchmarks (e.g., *Randomized Benchmarking* or *Gate Set Tomography*).
+- These tests tell us the average gate fidelity (e.g., 99.9%).
+- So we obtain a physical error rate $p approx 0.001$.
+- **Scientific basis**: it represents our **statistical knowledge** of hardware quality.
+
+=== 3.2 Mathematical necessity of Bayesian inference
+BP is essentially Bayesian inference. By Bayes' theorem:
+$
+  P("error" | "phenomenon") ∝ P("phenomenon" | "error") dot P("error")
+$
+- *"Phenomenon"* is the observed syndrome (which stabilizers fired).
+- *"Error"* is the error vector $e$ we want to infer.
+- *$P("error")$* is the prior probability $p$.
+
+If we do not include $p$, we are effectively assuming "error" and "no error" are equally likely (i.e., $p=0.5$), which makes $L=0$. That means there is no initial bias, and the decoder will struggle to converge because it has no baseline to judge whether a strange syndrome comes from a rare complex error or a common simple one.
+
+=== 3.3 Why does the LLR formula look like that?
+The formula $L = ln((1-p)/p)$ is a **weighting system**.
+- If $p$ is small (good hardware, e.g. $10^(-3)$), $L approx ln(1000) approx 6.9$, so the initial confidence is high.
+- If $p$ is large (poor hardware, e.g. $10^(-1)$), $L approx ln(9) approx 2.2$, so the initial confidence is lower.
+
+This tells BP: *"Unless nearby checks (evidence) strongly accuse this bit, since $p$ is small you should tend to believe it is innocent."*
+
+=== 3.4 Limitations
+Of course, a simple $p$ model is not always realistic: it usually assumes errors are **i.i.d.** (independent and identically distributed).
+But in real hardware, errors are often correlated (e.g., cosmic rays can flip a whole region, or a 2-qubit gate can cause two bits to fail together). Advanced decoders use more complex **correlated noise models** to initialize LLRs, rather than a single $p$.
+
+#link(<Prior-back>)[#text(fill:blue)[*BACK?*]]