[WIP] [Tinkering] A lossless parser #199

nickbabcock · 2025-08-02T01:06:45Z

Currently there are two ways to parse text in jomini:

text::TokenReader an implementation focused on performant and streaming deserialization of save files. Very little interpretation of the data is performed at this level. Analogous to a lexer.
TextTape requires all the input up front and while it parses into a linear tape, it will attempt to assign semantic meaning like object/array detection and it's bounds, skipping empty objects, etc. These semantics form the base of the text mid-level API (ObjectReader) where one can traverse and recurse into objects and arrays. This mid-level API is even responsible for converting the text into JSON.

TextTape came first and much time was spent on making it performant for save files. But now that all the save file parsers (or the performance-sensitive ones at least) have moved to the being solely based on lexing the input and not on parsing it, TextTape occupies an odd spot. TextTape is still lossy with an API that either makes mutation cumbersome or impossible.

What if a new parser was created with the intent of lossless parsing that offered a more ergonomic API for transformation and was resilient in the face of errors, what would it look like? And importantly can it still be made fast enough to supplant all of TextTape use cases?

Starting with the lexer, we need to capture trivia such as whitespace and comments such that one can print out the stream of tokens and receive the exact input back. I imagine it looking like something from logos

Then from stream of tokens we construct a concrete syntax tree with rowan.

I'm imagining non-terminals nodes to be:

Document // Root level node
Variable (a variable token followed by unquoted token)
Field: (a value-operator-value triplet, gotta support that object template syntax)
Value (a wrapper node for any value type: quoted, unquoted, container)
Header (an unquoted followed by a brace container: rgb, hsv)
BraceContainer a list of values and fields within { ... }
BracketContainer a list of values and fields within [ ... ]

What other nodes am I missing?

From here I'm murky on implementations, but I want to satisfy some notion of backwards compatibility with more features:

Use cases like JSON output and serde deserialization (where the current test suite should be maintained) do not care about trivia like whitespace and comments. Should there be a way to offer an API exposes a trivia-less structure (it can still be rooted in trivia, just exposing a filter on top)

Conversion to a DOM. Something equivalent to serde_json::Value, but more focused on lossless data (eg: it is important to distinguish quoted from unquoted values).

Transformations. Two example transformation use cases should be easy to code up and they should be lossless in respective to trivia. One should be replacing the exists operator with the equals operator. The other one should be an interpolation, which takes all the variable references found in assignments and expressions and resolves them to their value and computes the expression.

obj = { @half = 0.5 pos_x=@half pos_y=@[half*2] }
scale = @[1-0.25]

should be interpolated as:

obj = { pos_x=0.5 pos_y=1 }
scale = 0.75

These transformation need to return the same interface that allows one to serialize as json, deserialize, convert into a dom, or apply more transformations. How should these transformations be structured not just for these two examples but for future use cases, like a formatter who may want to preserve comment trivia but discard other trivia?

To be resilient to errors, it seems like a good idea to allow a partial tree or an one filled with errors so that once can still semi-process it. There'll always be new syntax.

With all of these features, seems reasonable if someone wanted to build an LSP on top (some sort of incremental parsing seems desirable)

For reference:

...

c82a045

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] [Tinkering] A lossless parser #199

[WIP] [Tinkering] A lossless parser #199

Uh oh!

nickbabcock commented Aug 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[WIP] [Tinkering] A lossless parser #199

Are you sure you want to change the base?

[WIP] [Tinkering] A lossless parser #199

Uh oh!

Conversation

nickbabcock commented Aug 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants