Practical sheaf-theoretic tools for multimodal ML data transformations
ModalSheaf provides a practical, intuitive API for moving data between ML modalities (text, images, audio, embeddings, etc.) while tracking:
- Information loss during transformations
- Reversibility of transformations (isomorphisms vs lossy maps)
- Consistency when fusing multiple data sources
- Hierarchical structure (pixels → patches → images → videos)
Built on sheaf theory, but you don't need to know any math to use it.
from modalsheaf import ModalityGraph, Modality, Transformation
# Define your modalities
graph = ModalityGraph()
graph.add_modality("image", shape=(224, 224, 3))
graph.add_modality("embedding", shape=(768,))
graph.add_modality("text", shape=None) # variable length
# Register transformations (restriction maps)
graph.add_transformation(
source="image",
target="embedding",
func=clip_image_encoder,
inverse=None, # Not invertible!
info_loss="high" # Lossy transformation
)
graph.add_transformation(
source="text",
target="embedding",
func=clip_text_encoder,
inverse=None,
info_loss="high"
)
# Check consistency between modalities
image_emb = graph.transform("image", "embedding", my_image)
text_emb = graph.transform("text", "embedding", my_caption)
consistency = graph.measure_consistency(
{"image": my_image, "text": my_caption}
)
# Returns: {"score": 0.87, "H1": 0.13, "diagnosis": "minor inconsistency"}pip install modalsheafThink of each data type (image, text, audio) as a place where data can live.
Transformations (encoders, decoders) are roads connecting places. Some roads are:
- Two-way (invertible/isomorphism): You can go back and forth without losing anything
- One-way (lossy): Information is lost, you can't fully recover the original
When you have data from multiple sources about the same thing, do they agree?
- Image shows a cat, caption says "a dog" → Inconsistent
- Image shows a cat, caption says "a cat" → Consistent
See INTUITIVE_COHOMOLOGY.md for a full explanation, but briefly:
- H⁰ = "What everyone agrees on" — The global consensus
- H¹ = "Where disagreements hide" — Inconsistencies that can't be resolved
If H¹ = 0, your data is perfectly consistent. If H¹ ≠ 0, there's a conflict somewhere.
- Define custom modalities with shapes and dtypes
- Build modality graphs with transformations
- Automatic path finding between modalities
- Register forward and inverse transforms
- Track information loss (isomorphism, embedding, projection, lossy)
- Compose transformations automatically
- Measure consistency across modality graph
- Compute cohomology (H⁰, H¹) for data fusion
- Diagnose where inconsistencies occur
- Čech Cohomology: Rigorous computation of cohomology groups
- Persistent Cohomology: Handle noisy data, separate signal from noise
- Cocycle Conditions: Verify and repair calibration consistency
- Images (PIL, numpy, torch tensors)
- Text (strings, token IDs, embeddings)
- Audio (waveforms, spectrograms, embeddings)
- Video (frame sequences, temporal embeddings)
- Structured data (JSON, dataframes)
- PyTorch transforms
- HuggingFace encoders
- OpenAI/Anthropic embeddings
- Custom encoders
- Intuitive Guide to Cohomology — No math background required!
- Examples — Practical code examples
- Why Topology? — Motivation
- Sheaves Intuition — The key abstraction
- Advanced Cohomology — Rigorous computation
- Cocycles in Practice — Real-world examples
- Persistence Guide — Handling noisy data
| Feature | pysheaf | modalsheaf |
|---|---|---|
| Focus | General sheaf theory | ML modality transformations |
| API | Mathematical (cells, cofaces) | Practical (modalities, transforms) |
| Target users | Mathematicians | ML practitioners |
| Built-in modalities | None | Images, text, audio, video |
| ML integration | None | PyTorch, HuggingFace, etc. |
MIT
@software{modalsheaf,
title = {ModalSheaf: Practical Sheaf-Theoretic Tools for Multimodal ML},
author = {Lee, Michael Harrison},
year = {2024},
url = {https://github.com/MikeHLee/modalsheaf}
}