When multiple experts give conflicting advice about the same problem, most systems try to force artificial consensus or pick a single "winner." Contrakit measures exactly how much those perspectives actually contradict—in bits.
That measurement reveals something about disagreement itself. Some clashes come from noise you can average away with more data. Others are structural, built into the problem, irreducible. Contrakit distinguishes between these cases. When
That number quantifies the minimum information cost of forcing agreement where none exists. The cost appears in real systems: compression needs
pip install contrakitFor examples and development:
git clone https://github.com/off-by-some/contrakit.git
cd contrakit
poetry install
poetry run pytest -qYou need to drive across town and check three navigation apps for the same route. Waze says 27 minutes, Google Maps says 32 minutes, Apple Maps says 29 minutes. All three are using real traffic data, GPS signals, and historical patterns—but they give different answers. Why?
Each app uses different algorithms, data sources, and assumptions. Waze emphasizes current traffic reports from other drivers. Google Maps considers historical averages plus current conditions. Apple Maps might weight recent accidents differently. They're all valid approaches to the same question.
from contrakit import Observatory
obs = Observatory.create(symbols=["<30min", "30-35min", ">35min"])
TravelTime = obs.concept("TravelTime")
# Waze: optimistic, focuses on current traffic flow
with obs.lens("Waze") as waze:
waze.perspectives[TravelTime] = {"<30min": 0.7, "30-35min": 0.2, ">35min": 0.1}
# Google Maps: conservative, uses historical + current data
with obs.lens("Google") as google:
google.perspectives[TravelTime] = {"<30min": 0.2, "30-35min": 0.6, ">35min": 0.2}
# Apple Maps: balanced approach
with obs.lens("Apple") as apple:
apple.perspectives[TravelTime] = {"<30min": 0.4, "30-35min": 0.4, ">35min": 0.2}
behavior = (waze | google | apple).to_behavior()
print(f"Agreement: {behavior.alpha_star:.3f}") # 0.965
print(f"Contradiction: {behavior.contradiction_bits:.3f}") # 0.052 bitsThe apps show high agreement (0.965) with just 0.052 bits of contradiction. Their different methodologies produce consistent results overall, requiring minimal additional information to reconcile their estimates.
Compare this to a contradiction of 0.7 bits, which would indicate the apps are using fundamentally incompatible data sources—you'd need to treat their predictions as separate time dimensions rather than reconciling them into a single "travel time" estimate.
Contrakit can be used anywhere there are multiple valid approaches to the same question that can produce measurable disagreement. We've included examples of the following systems within this repository to demonstrate the versatility:
-
Quantum systems let us measure "how quantum" a system really is, from Bell inequality violations (~0.012 bits) to logical paradoxes that classical physics can't explain (~0.132 bits), all in the same quantitative framework.
-
Neural networks enable predicting hallucination rates before training, with
$K(P)$ setting irreducible error floors that persist even with infinite training data on structurally contradictory tasks, while witness capacity$r$ determines if architectures can express epistemic uncertainty. -
Byzantine consensus algorithms become more efficient by measuring which messages actually need verification, reducing communication overhead while staying secure against attacks.
-
Statistical paradoxes helps detect Simpson's paradox by quantifying when aggregated data contradicts subgroup patterns, guiding researchers on when additional context variables are needed for valid conclusions.
Shannon's framework handles randomness brilliantly within a single coherent context. Entropy tells you how many bits you need to encode outcomes when you have one unified model. Mutual information measures dependencies within that model. These tools assume you can eventually settle on a single story about what's happening.
This assumption traces back to Boole, who fixed propositions as true or false. Kolmogorov built probability on that logic. Shannon showed how such decisions travel as bits. None of these frameworks claimed the world was binary—they just assumed our records could be. One message, one symbol, one frame.
That inheritance runs deep. Modern databases, communication protocols, neural network outputs—all collapse observations to single values. It's not a flaw in those systems; it's the foundation they were built on.
Contrakit measures what happens when that assumption breaks down. When multiple valid observations cannot be reconciled into a single record, classical measures assign the disagreement a cost of zero. They price which outcome occurred within a framework, not whether frameworks can be reconciled at all. That's the gap—and
Contrakit computes quantities that characterize disagreement across multiple levels:
Core Measurements
| Measure | Formula | Description |
|---|---|---|
| Agreement Coefficient | Measures how closely contradictory perspectives can be approximated by any single unified explanation. Ranges from 0 (complete incompatibility) to 1 (perfect agreement). | |
| Contradiction Measure | Converts agreement into bits—the minimum information needed per observation to maintain the fiction that perspectives agree. | |
| Witness Vector | Shows which contexts create the tension. Distributes evenly when all perspectives contribute equally, concentrates when specific contexts drive disagreement. |
Agreement Metrics
| Measure | Formula | Description |
|---|---|---|
| Bhattacharyya Coefficient | Core agreement kernel measuring distributional overlap between perspectives. Ranges from 0 (no overlap) to 1 (identical distributions). | |
| Hellinger Distance | Geometric distance measure between probability distributions. Quantifies how far apart perspectives lie in probability space. | |
| Total Variation Distance | Statistical separation measure bounding how distinguishable contradictory data is from unified explanations. |
Operational Bounds
| Measure | Formula | Description |
|---|---|---|
| Witness Capacity | Architectural expressiveness for expressing epistemic uncertainty. |
|
| Type-II Error Exponent | Hypothesis testing performance bound. Satisfies |
|
| Conditional Entropy | Context-dependent uncertainty. Contradiction adds |
Framework Concepts
| Concept | Notation | Description |
|---|---|---|
| Frame-Independent Behaviors | Classical behaviors admitting unified explanations. Contradiction equals zero precisely when |
|
| Context Weights | Per-context contributions to overall disagreement. Optimal weights reveal which perspectives drive the contradiction. |
Real-World Costs
These aren't abstract numbers. The costs appear in real information processing tasks:
-
Compression: When data comes from contradictory contexts, you pay an additional
$K(P)$ bits per symbol beyond Shannon's entropy bound -
Communication: Channels serving incompatible interpretations lose exactly
$K(P)$ bits of capacity -
Simulation: Variance scales exponentially at
$2^{2K(P)} - 1$ when approximating contradictory behavior -
Neural Networks: Error floors appear at
$1 - 2^{-K(P)}$ regardless of training data or architecture
Every information processing task pays the same tax when perspectives contradict:
-
Compression shows this most directly. When data comes from contradictory contexts, Shannon's entropy bound of
$H(X|C)$ bits per symbol no longer suffices—you need exactly$H(X|C) + K(P)$ bits instead. That extra$K(P)$ isn't overhead from suboptimal coding; it's the structural cost of maintaining a single codebook across incompatible interpretations. -
Communication faces the mirror image: build a channel serving receivers with incompatible decoding schemes and your effective capacity drops from Shannon's limit by exactly
$K(P)$ bits. The channel itself hasn't degraded—the loss comes from trying to serve contradictory interpretations simultaneously. -
Simulation gets exponentially worse. Approximate contradictory behavior using importance sampling from a unified model and your variance grows by at least
$2^{2K(P)} - 1$ . Hypothesis testing encounters fundamental detection limits—distinguishing competing hypotheses cannot exceed an error exponent of$K(P)$ when those hypotheses represent contradictory perspectives.
These costs emerge from the same source:
The repository contains working implementations across domains. Quantum examples compute
The hallucination experiments comprise 11 systematic studies tracing how contradiction bounds neural network performance across architectures, training regimes, and task structures. Byzantine consensus examples demonstrate adaptive overhead allocation using witness vectors to concentrate verification effort where disagreement actually occurs. Statistical examples resolve Simpson's paradox by computing
Project Structure
examples/
├── quickstart/ # Core concepts and basic usage
├── quantum/ # Bell violations, KCBS, magic squares
├── hallucinations/ # 11 experiments on neural contradictions
├── consensus/ # Byzantine protocols with adaptive verification
├── statistics/ # Simpson's paradox and frame integration
└── run.py # Execute all examples
docs/
├── paper/ # Mathematical theory (web and PDF)
├── api/ # Implementation reference
└── images/ # Figures and visualizations
Documentation Resources
| Resource | Content |
|---|---|
| Mathematical Paper (PDF) | Complete theory with proofs |
| Paper (Web) | Browsable format |
| Math Cheatsheet | Formula reference |
| API Cheatsheet | Implementation patterns |
| Axioms | Foundation principles |
| Theorems | Operational consequences |
Dual-licensed: MIT for code (LICENSE), CC BY 4.0 for documentation and figures (LICENSE-CC-BY-4.0).
@software{bridges2025contrakit,
author = {Bridges, Cassidy},
title = {Contrakit: A Python Library for Contradiction},
year = {2025},
url = {https://github.com/off-by-some/contrakit},
license = {MIT, CC-BY-4.0}
}