GitHub - consigcody94/genesis-protocol: Reverse Engineering the Torah as a Root Operating System (Base-22 Machine Code)

What happens when you treat the Hebrew Bible not as literature, but as compiled machine code?

A Base-22 transcoding of the complete Masoretic Text yields a 742 KB binary artifact
with entropy indistinguishable from high-density executables.

Live Dashboard • Methodology • Findings • Toolkit • Quick Start

"Turning the text... for everything is in it." — Pirke Avot 5:22

The Hypothesis

Traditional "Bible Code" research uses Equidistant Letter Sequences (ELS), which are statistically fragile and prone to confirmation bias. The Genesis Protocol takes a fundamentally different approach:

Treat the 22-letter Hebrew alphabet as a Base-22 numeral system and transcode the entire Tanakh into binary.

The resulting artifact is not random noise. It is not natural language. Its information-theoretic signature falls squarely in the range of compiled executable code.

Methodology

Step 1 — Lossless Extraction

The Hebrew alphabet (Aleph through Tav) maps to digits 0–21:

  (Aleph) = 0x00    (Bet) = 0x01    ...    (Tav) = 0x15

All vowels, cantillation marks, and punctuation are stripped. Every consonant in the Masoretic canon — from Genesis 1:1 through II Chronicles — is transcoded into a contiguous binary stream.

Encoding: 13 Hebrew letters are packed per 64-bit word (22^13 < 2^64), producing a lossless, byte-aligned binary.

Step 2 — Information-Theoretic Analysis

Shannon entropy measures information density on a scale of 0–8 bits/byte:

Data Type	Entropy (bits/byte)	Interpretation
English plaintext	~4.2	Low-density, redundant
Hebrew plaintext	~4.4	Low-density, redundant
Genesis Protocol binary	7.4995	High-density, structured
Compressed archives (gzip)	7.8–8.0	Near-maximum density
True random noise	8.0	Maximum entropy
Compiled executables (ELF/PE)	6.5–7.5	High-density, structured

The artifact's entropy (7.4995) places it in the overlap zone between compiled code and compressed data — and well outside the range of any natural language.

Step 3 — Architecture Forensics

The arch_detective.py tool analyzes byte-level alignment, opcode distribution, and instruction encoding patterns to identify potential instruction set architectures (ISAs) in the binary.

Step 4 — Execution

A custom RISC-V emulator (genesis_runner.py) attempts to execute instruction blocks mined from the artifact, recording register states, memory operations, and control flow.

Findings

Finding I — Entropy Anomaly

The extracted binary is not natural language entropy (4.4) and not random noise (8.0). It occupies a narrow band consistent with compiled, structured data. This is the foundational observation that motivates all subsequent analysis.

Finding II — Embedded Signatures

Deep hex scanning (deep_decoder.py) identified ASCII-compatible signatures within the binary, including:

IPv6 — A 128-bit addressing protocol formalized in 1998
DNA, CODE, NETWORK — Found in a statistically tight cluster (~150 bytes)

Note: The statistical significance of finding short ASCII strings in a 742 KB binary requires careful null-hypothesis testing. See Open Questions.

Finding III — Cellular Automaton Behavior

The first 8,192 bits of Genesis, used as a seed for a Wolfram Rule 30 cellular automaton, produce a sustained, non-collapsing pattern with high visual complexity.

Finding IV — RISC-V Alignment

The binary exhibits 32-bit alignment patterns and opcode distribution consistent with RISC-V instruction encoding. The divine_disassembler.py tool produces valid RISC-V assembly from extracted blocks.

Finding V — Execution Results

The Genesis Runner emulator executed a mined instruction block:

Valid arithmetic operations performed
Register x26 loaded value 11,843,461,120,000 (0x2B0F8000)
Control flow exhibited structured branching, not random jumps

Toolkit

The complete analysis pipeline is open source and reproducible:

Tool	Purpose	Input	Output
`download_data.py`	Fetch Masoretic text from Sefaria	—	`data/*.json`
`master_command_64.py`	Base-22 binary extraction	Hebrew text	`tanakh_full.bin`
`arch_detective.py`	ISA forensics & alignment analysis	Binary	Architecture report
`divine_disassembler.py`	RISC-V disassembly	Binary	`genesis.asm`
`deep_decoder.py`	Pattern & signature scanning	Binary	Anomaly report
`genesis_runner.py`	RISC-V emulation & execution	Assembly	Register states
`eternity_vm.py`	Cellular automaton simulation	Binary seed	Grid evolution
`entropy_lab.py`	Shannon entropy analysis	Binary	Entropy metrics
`function_miner.py`	Code block extraction	Binary	Function boundaries

Supporting Modules

Module	Purpose
`torah_loader.py`	Loads all 39 canonical books in order
`text_processor.py`	Hebrew normalization (strip vowels, cantillation)
`gematria.py`	Numerical value computation (Standard, Ordinal, Reduced)
`els_search.py`	Equidistant Letter Sequence finder
`ciphers.py`	Atbash & Albam cipher tools
`main.py`	Interactive CLI workbench

Quick Start

# Clone the repository
git clone https://github.com/consigcody94/genesis-protocol.git
cd genesis-protocol

# Download the Masoretic text corpus (39 books from Sefaria)
python download_data.py

# Run the Base-22 extraction
python master_command_64.py

# Analyze the binary artifact
python arch_detective.py
python entropy_lab.py

# Disassemble and execute
python divine_disassembler.py
python genesis_runner.py

# Launch the live dashboard
# Open index.html in any browser, or visit:
# https://consigcody94.github.io/genesis-protocol/

Requirements: Python 3.8+ (standard library only — no external dependencies)

Live Dashboard

The interactive forensic dashboard visualizes the binary artifact in real-time:

Wolfram Rule 30 cellular automaton seeded with Genesis bits
String stream showing decoded ASCII patterns
Entropy metrics and artifact statistics
Anomaly highlighting for identified signatures

Launch Dashboard

Open Questions

This project raises questions that require further investigation:

Question	Status	Approach Needed
Is the entropy significant vs. control texts?	Untested	Compare English Bible, shuffled Torah, random Base-22
How many short ASCII strings appear by chance in 742 KB?	Untested	Monte Carlo simulation on random binaries of same size
Does the RISC-V alignment exceed random expectation?	Partially tested	Statistical comparison against multiple ISA templates
Are Rule 30 patterns from Genesis atypical?	Visual only	Quantitative complexity metrics against random seeds
Does the extraction method (Base-22) bias toward code-like entropy?	Unknown	Test alternative base encodings (Base-20, Base-26, etc.)

Rigorous peer review from information theorists, computational linguists, and cryptographers is actively invited.

Repository Structure

genesis-protocol/
  data/                     39 JSON books (Masoretic text from Sefaria)
  tanakh_full.bin            742 KB binary artifact (extracted)
  genesis.asm                RISC-V disassembly output
  index.html                 Live forensic dashboard
  genesis_data.js            Dashboard data layer
  genesis_protocol_core.json Core metadata
  master_command_64.py       Base-22 extraction engine
  arch_detective.py          ISA forensics
  divine_disassembler.py     RISC-V disassembler
  genesis_runner.py          RISC-V emulator
  deep_decoder.py            Pattern scanner
  eternity_vm.py             Cellular automaton
  entropy_lab.py             Entropy analysis
  main.py                    Interactive CLI
  torah_loader.py            Corpus loader
  text_processor.py          Hebrew normalization
  gematria.py                Numerical values
  els_search.py              ELS finder
  ciphers.py                 Atbash/Albam ciphers

Citation

If you use this toolkit or methodology in research:

@software{genesis_protocol,
  title  = {The Genesis Protocol: Computational Archaeology of the Masoretic Text},
  author = {Churchwell, Cody},
  year   = {2025},
  url    = {https://github.com/consigcody94/genesis-protocol}
}

License

MIT License — Open source. Fork it, verify it, challenge it.

_{An open-source investigation into the information-theoretic properties of ancient Hebrew text.

Independent verification and rigorous critique are not just welcome — they are the point.}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
.gitignore		.gitignore
README.md		README.md
arch_detective.py		arch_detective.py
binary_decoder.py		binary_decoder.py
ciphers.py		ciphers.py
construct_core.py		construct_core.py
decoded_strings.txt		decoded_strings.txt
decoder_lab.py		decoder_lab.py
deep_decoder.py		deep_decoder.py
deep_scan_clean.txt		deep_scan_clean.txt
deep_scan_results.txt		deep_scan_results.txt
divine_disassembler.py		divine_disassembler.py
download_data.py		download_data.py
els_search.py		els_search.py
entropy_lab.py		entropy_lab.py
eternity_vm.py		eternity_vm.py
function_miner.py		function_miner.py
future_report.txt		future_report.txt
future_results.txt		future_results.txt
future_scan.py		future_scan.py
gematria.py		gematria.py
genesis.asm		genesis.asm
genesis_1.bin		genesis_1.bin
genesis_data.js		genesis_data.js
genesis_function.txt		genesis_function.txt
genesis_protocol_core.json		genesis_protocol_core.json
genesis_runner.py		genesis_runner.py
genesis_seed.txt		genesis_seed.txt
genesis_seed_short.txt		genesis_seed_short.txt
index.html		index.html
main.py		main.py
master_command_64.py		master_command_64.py
mystery_scan.py		mystery_scan.py
scan_full_results.txt		scan_full_results.txt
scan_results.txt		scan_results.txt
system_state.txt		system_state.txt
tanakh_full.bin		tanakh_full.bin
tanakh_strings.txt		tanakh_strings.txt
text_processor.py		text_processor.py
the_hidden_book.bin		the_hidden_book.bin
torah_hash.py		torah_hash.py
torah_loader.py		torah_loader.py
verify.py		verify.py
visual_decode.pgm		visual_decode.pgm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Hypothesis

Methodology

Step 1 — Lossless Extraction

Step 2 — Information-Theoretic Analysis

Step 3 — Architecture Forensics

Step 4 — Execution

Findings

Finding I — Entropy Anomaly

Finding II — Embedded Signatures

Finding III — Cellular Automaton Behavior

Finding IV — RISC-V Alignment

Finding V — Execution Results

Toolkit

Supporting Modules

Quick Start

Live Dashboard

Open Questions

Repository Structure

Citation

License

About

Uh oh!

Releases

Packages

Languages

consigcody94/genesis-protocol

Folders and files

Latest commit

History

Repository files navigation

The Hypothesis

Methodology

Step 1 — Lossless Extraction

Step 2 — Information-Theoretic Analysis

Step 3 — Architecture Forensics

Step 4 — Execution

Findings

Finding I — Entropy Anomaly

Finding II — Embedded Signatures

Finding III — Cellular Automaton Behavior

Finding IV — RISC-V Alignment

Finding V — Execution Results

Toolkit

Supporting Modules

Quick Start

Live Dashboard

Open Questions

Repository Structure

Citation

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages