This repository contains scripts for decoding Commodore 64 BASIC source files and converting them into lexical tokens in ASCII format. The parser includes a syntax tagger that classifies tokens according to their functional properties in the BASIC language. The tagset used for classification is available in two formats: as a human-readable markdown table and as a machine-readable JSON file.
- Stateful parsing of tokenized C64 BASIC files
- PETSCII to ASCII conversion with proper encoding handling
- Syntax tagging system for token classification
- Context-aware disambiguation (variables, operators, commands)
- Token chunking for multi-byte sequences (e.g.,
<=, variable names) - Assembly detection in DATA statements
- Excel export for token analysis
- Python 3.12+
- Required packages:
pandas,tqdm,openpyxl
Run the parser with the main.py script to decode files in the examples directory:
python main.pyThis will:
- Read tokenized BASIC files from
examples/encoded/ - Save decoded ASCII files to
examples/decoded/(with.basextension) - Generate token analysis tables in
examples/tables/(Excel format)
@software{wagner2025basicparser,
author = {Wagner, Julian Severin},
title = {BASICParser},
version = {1.0},
year = {2025},
url = {https://github.com/SojaSurfer/BASICParser},
note = {Software}
}