Skip to content

Transforms Markdown documents into hyperdimensional vectors and reconstructs them using dual HDC encoding and an optional LLM reconstruction.

License

Notifications You must be signed in to change notification settings

Garletz/HDC-Markdown-Encoder-Reconstructor

Repository files navigation

HDC Markdown Encoder & Reconstructor

HDC Python License Status screen-capture (1) webm

Transforms Markdown documents into hyperdimensional vectors and reconstructs them using dual HDC encoding and an optional LLM reconstruction.

How It Works

Dual HDC Pipeline: Documents split into semantic content and positional structure, encoded as 10,000-dimensional vectors, then reconstructed via HDC unbinding.

  1. Content Vector: Encodes which words exist
  2. Position Vector: Encodes which positions are used
  3. Pair Vectors: HDC binding creates word-position associations
  4. Reconstruction: HDC unbinding + LLM recovers original text

Key Innovation: Mathematical "handshake" between words and positions enables perfect order recovery.

Algorithm

HDC Binding: pair_vector = word_vector * position_vector
HDC Unbinding: recovered_word = document_vector * position_vector
Vector Bundling: content_vector = sum(word_vectors)
Storage: int8 (±1) for maximum compression more details in architecture_en.md

Key Features

  • Universal Dictionary: 20,000-word shared vocabulary
  • Scalable: Linear O(n) performance

Installation

git clone https://github.com/Garletz/HDC-Markdown-Encoder-Reconstructor.git

pip install -r requirements.txt && pip install -e .

CLI Usage

# Encode document
python cli.py --encode-dual "document.md" --config config.yaml

# Reconstruct from vectors
python cli.py --reconstruct-dual \
  --content-vector "encoded_vectors/encoded_X_content.npy" \
  --position-vector "encoded_vectors/encoded_X_position.npy" \
  -o "output.md"

Performance if word not repeated

Tokens Encoding Storage Reconstruction Accuracy
8 0.8s 240KB 1.2s 100%
16 1.1s 480KB 1.8s 100%
50+ 2.3s 1.5MB 3.1s 100%

Output Files

encoded_vectors/
├── encoded_N_content.npy    # Semantic information
├── encoded_N_position.npy   # Structural information
└── encoded_N_pairs.npy      # Word-position bindings

Limitations

  • Repeated words may cause position confusion (5-10 and more % cases)
  • Out-of-vocabulary tokens are skipped

  • ©OpenDataHive License (LICENSE MODIFIED)

Project Goal

  • Enable ultra-light document transfer via semantic vector compression.
    If sender and receiver share the same item memory (dictionary), the original text can be perfectly reconstructed from compact .npy vectors.
  • This approach aims to enable a wide range of future use cases...
  • Poneglyph ...

Currently experimental — concept in development.

📞 Contact

Get in touch with the OpenDataHive team:

Twitter Devpost WhatsApp


About

Transforms Markdown documents into hyperdimensional vectors and reconstructs them using dual HDC encoding and an optional LLM reconstruction.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages