A binary protocol code generator
Overview
- Parse a small DSL (
.jaw) that describes binary message layouts - Validate and compile into an intermediate type graph
- Generate reader/writer code for C++, Python, or Rust
- Endianess is assumed to be little
Quickstart
- Build:
cargo build(Rust 2024 edition) - Run:
cargo run -- <INPUT.jaw> --kind <cpp|python|rust> <OUTPUT>- Example (Python):
cargo run -- assets/basic.jaw --kind python generated/python/basic.py - Example (C++):
cargo run -- assets/basic.jaw --kind cpp generated/cpp/src/basic.hpp - Example (Rust):
cargo run -- assets/basic.jaw --kind rust generated/rust/basic.rs
- Example (Python):
CLI
--kind: selects output generator. Supported:cpp,python,rust.- Exit codes: non-zero on parse/validation/generation errors. Errors display spans using ariadne with helpful labels.
DSL Summary
- Primitives:
u8,u16,u32,u64,i8,i16,i32,i64,f32,f64 - Pack (POD struct):
pack Namethen members as- field : Type- Members must be POD (primitives, enums, packs, fixed arrays)
- Enum (integer base with optional default):
enum Name : <primitive-int>members- NAME = <int>and optional= DEFAULT = <int>
- Bitfield (on integer base):
bits Name : <primitive-int>then- start[-end] field : <primitive|enum>
- Sequence (dynamic, decoded field by field):
seq Namewith- field : Type
- Variant (tagged union):
variant Name : <primitive-int>alts- <int> => Typeand optional default= <int> => Type- Discriminant is the explicit
<int>value
- Arrays:
- Dynamic sequence element:
{N * M}reads a count of typeN, thenMitems (sequence field only) - Fixed dynamic-count:
{I * M}literalIitems (sequence field only) - Fixed POD:
[I * M]literalIitems; POD-only (usable inpack)
- Dynamic sequence element:
Lexical Notes
- Whitespace: spaces, tabs and
\rare ignored; newlines are significant and tokenized asNewline. - Comments:
# ...to end-of-line are skipped (newline still tokenized). - Numbers: only unsigned integer literals are lexed; a leading
-is a separate token used by the parser when needed. - Strings:
"..."with escapes\\,\",\n,\t. Multiline and unknown escapes are errors. - Symbols:
: - = ( ) [ ] { } * =>.
Generated Code
- Python: reader/writer helpers and simple data structures. See
generated/python/examples. - C++: header-only types and readers/writers.
- Rust: a module with types plus
read_<Type>/write_<Type>functions usingstd::io::Read/Write.
Examples
- See
assets/basic.jawand generated outputs undergenerated/for reference.
Developing
- Run tests:
cargo test - Project layout:
src/tokens.rs: tokenizer (lexer) and token definitionssrc/tokenreader.rs: small helper for parser token consumptionsrc/module.rs: parser, validation, and type graphsrc/codegen/: Code generatorsassets/: example.jawsources
- Style: keep changes focused and small; prefer helpful error messages with spans.
Notes
- Aliases and imports (
alias,use/as) are reserved but not implemented yet. - Endianness is little-endian in the Python generator; C++ expects the Reader/Writer to define endianness.