Skip to content

BASICParser is a Python tool for parsing tokenized Commodore 64 BASIC source files into ASCII format with comprehensive syntax tagging and lexical analysis capabilities.

License

Notifications You must be signed in to change notification settings

SojaSurfer/BASICParser

Repository files navigation

BASICParser

Python version

This repository contains scripts for decoding Commodore 64 BASIC source files and converting them into lexical tokens in ASCII format. The parser includes a syntax tagger that classifies tokens according to their functional properties in the BASIC language. The tagset used for classification is available in two formats: as a human-readable markdown table and as a machine-readable JSON file.

Features

  • Stateful parsing of tokenized C64 BASIC files
  • PETSCII to ASCII conversion with proper encoding handling
  • Syntax tagging system for token classification
  • Context-aware disambiguation (variables, operators, commands)
  • Token chunking for multi-byte sequences (e.g., <=, variable names)
  • Assembly detection in DATA statements
  • Excel export for token analysis

Getting Started

Prerequisites

  • Python 3.12+
  • Required packages: pandas, tqdm, openpyxl

Example Use

Run the parser with the main.py script to decode files in the examples directory:

python main.py

This will:

  • Read tokenized BASIC files from examples/encoded/
  • Save decoded ASCII files to examples/decoded/ (with .bas extension)
  • Generate token analysis tables in examples/tables/ (Excel format)

Citation

@software{wagner2025basicparser,
  author = {Wagner, Julian Severin},
  title = {BASICParser},
  version = {1.0},
  year = {2025},
  url = {https://github.com/SojaSurfer/BASICParser},
  note = {Software}
}

About

BASICParser is a Python tool for parsing tokenized Commodore 64 BASIC source files into ASCII format with comprehensive syntax tagging and lexical analysis capabilities.

Topics

Resources

License

Stars

Watchers

Forks

Languages