Transforms Markdown documents into hyperdimensional vectors and reconstructs them using dual HDC encoding and an optional LLM reconstruction.
Dual HDC Pipeline: Documents split into semantic content and positional structure, encoded as 10,000-dimensional vectors, then reconstructed via HDC unbinding.
- Content Vector: Encodes which words exist
- Position Vector: Encodes which positions are used
- Pair Vectors: HDC binding creates word-position associations
- Reconstruction: HDC unbinding + LLM recovers original text
Key Innovation: Mathematical "handshake" between words and positions enables perfect order recovery.
HDC Binding: pair_vector = word_vector * position_vector
HDC Unbinding: recovered_word = document_vector * position_vector
Vector Bundling: content_vector = sum(word_vectors)
Storage: int8 (±1) for maximum compression
more details in architecture_en.md
- Universal Dictionary: 20,000-word shared vocabulary
- Scalable: Linear O(n) performance
git clone https://github.com/Garletz/HDC-Markdown-Encoder-Reconstructor.git
pip install -r requirements.txt && pip install -e .
# Encode document
python cli.py --encode-dual "document.md" --config config.yaml
# Reconstruct from vectors
python cli.py --reconstruct-dual \
--content-vector "encoded_vectors/encoded_X_content.npy" \
--position-vector "encoded_vectors/encoded_X_position.npy" \
-o "output.md"| Tokens | Encoding | Storage | Reconstruction | Accuracy |
|---|---|---|---|---|
| 8 | 0.8s | 240KB | 1.2s | 100% |
| 16 | 1.1s | 480KB | 1.8s | 100% |
| 50+ | 2.3s | 1.5MB | 3.1s | 100% |
encoded_vectors/
├── encoded_N_content.npy # Semantic information
├── encoded_N_position.npy # Structural information
└── encoded_N_pairs.npy # Word-position bindings
- Repeated words may cause position confusion (5-10 and more % cases)
- Out-of-vocabulary tokens are skipped
- Enable ultra-light document transfer via semantic vector compression.
If sender and receiver share the same item memory (dictionary), the original text can be perfectly reconstructed from compact.npyvectors. - This approach aims to enable a wide range of future use cases...
- Poneglyph ...
Currently experimental — concept in development.
Get in touch with the OpenDataHive team:
