Skip to content

WilliamWang84/orderbookprocessor

Repository files navigation

Order Book Feed Processor

A high-performance C++ implementation for processing binary order book message streams and generating price depth snapshots.

Overview

This project processes binary .bin files containing order book messages (Add, Update, Delete, Execute) and outputs price depth snapshots showing the top N price levels whenever visible changes occur.

Key Features

  • Efficient Binary Processing: Handles large binary files (1MB-2MB+) with optimal memory usage
  • Multi-Symbol Support: Processes multiple trading symbols simultaneously
  • Change Detection: Only outputs snapshots when the top N levels change
  • Price Aggregation: Aggregates order volumes at each price level
  • Little-Endian Binary Format: Correctly reads all message types with proper byte ordering

Project Structure

.
├── orderbook_processor.cpp    # Main order book processing engine
├── stream_generator.cpp        # Test data generator
├── output1.log                 # Expected output for validation
├── Makefile                    # Build automation
├── build_and_test.sh          # Comprehensive test script
└── README.md                   # This file

Building the Project

Prerequisites

  • C++17 compatible compiler (g++ 7.0+ or clang++ 5.0+)
  • Make (optional, for using Makefile)

Using Makefile (Recommended)

# Build all executables
make

# Build and run basic test
make test

# Run benchmarks on large files
make benchmark

# Validate output against expected
make validate

# Clean build artifacts
make clean

Manual Compilation

# Create build directory
mkdir -p build

# Compile processor
g++ -std=c++17 -O3 -Wall -Wextra -o build/orderbook_processor orderbook_processor.cpp

# Compile test generator
g++ -std=c++17 -O3 -Wall -Wextra -o build/stream_generator stream_generator.cpp

Usage

Processing Stream Files

# Basic usage
cat input.bin | ./build/orderbook_processor <depth_levels>

# Example with 5 levels
cat input_5lvls.bin | ./build/orderbook_processor 5

# Example with 10 levels
cat large_test_10lvls.bin | ./build/orderbook_processor 10

Generating Test Data

# Generate small test file matching expected output
./build/stream_generator input.bin

# Generate 1MB test file
./build/stream_generator large_1mb.bin 1

# Generate 2MB test file
./build/stream_generator large_2mb.bin 2

# Generate 10MB test file
./build/stream_generator huge_10mb.bin 10

Running Comprehensive Tests

# Run all tests including validation and performance tests
chmod +x build_and_test.sh
./build_and_test.sh

Binary Message Format

Message Header (8 bytes)

  • Sequence Number (4 bytes): uint32_t, little-endian
  • Message Size (4 bytes): uint32_t, little-endian

Message Types

Add Order (32 bytes)

  • Message Type: 'A'
  • Symbol: 3 bytes (alpha)
  • Order ID: 8 bytes (uint64_t)
  • Side: 1 byte ('B' = Bid, 'S' = Ask)
  • Reserved: 3 bytes
  • Size: 8 bytes (uint64_t)
  • Price: 4 bytes (int32_t, fixed 4 decimals)
  • Reserved: 4 bytes

Update Order (32 bytes)

  • Message Type: 'U'
  • Same structure as Add Order

Delete Order (16 bytes)

  • Message Type: 'D'
  • Symbol: 3 bytes
  • Order ID: 8 bytes
  • Side: 1 byte
  • Reserved: 3 bytes

Execute Order (24 bytes)

  • Message Type: 'E'
  • Symbol: 3 bytes
  • Order ID: 8 bytes
  • Side: 1 byte
  • Reserved: 3 bytes
  • Traded Quantity: 8 bytes (uint64_t)

Output Format

Each snapshot is printed as a single line:

sequenceNo, symbol, [(bidPrice1, bidVolume1), ...], [(askPrice1, askVolume1), ...]

Example:

1, SB0, [(..., ...)], []
2, SB0, [(..., ...), (..., ...)], []
3, SB0, [(..., ...), (..., ...)], []

Key Rules:

  • Bids sorted by price descending (highest first)
  • Asks sorted by price ascending (lowest first)
  • Volumes aggregated at each price level
  • Only printed when top N levels change

Architecture & Design

Core Components

  1. OrderBook Class

    • Maintains orders by Order ID
    • Aggregates volume by price level
    • Separate bid/ask maps with appropriate sorting
  2. OrderBookProcessor Class

    • Routes messages to appropriate handlers
    • Manages multiple symbol order books
    • Detects and outputs snapshot changes
  3. Snapshot Change Detection

    • Compares current top N levels with last snapshot
    • Only outputs when changes occur in visible levels

Performance Optimizations

  • O(log n) operations: Uses std::map for efficient price level management
  • Streaming I/O: Processes data directly from stdin without loading entire file
  • Minimal allocations: Reuses data structures where possible
  • Efficient aggregation: Maintains running totals for each price level

Memory Complexity

  • Per Order: ~40 bytes (Order struct + map overhead)
  • Per Price Level: ~24 bytes (map entry)
  • Total: O(N × M) where N = number of active orders, M = number of symbols

For a 2MB file with ~50,000 messages, typical memory usage is 5-10MB.

Testing Strategy

1. Functional Tests

  • Small test file: Validates against expected output (output.log)
  • Message types: Tests all message types (Add, Update, Delete, Execute)
  • Multiple symbols: Ensures symbol isolation

2. Performance Tests

  • 1MB file: ~25,000 messages, validates processing speed
  • 2MB file: ~50,000 messages, tests memory efficiency
  • Large depth: Tests with depth=3, 5, 10, 20

3. Edge Cases

  • Orders with zero volume
  • Price levels that disappear
  • Crossed markets (bid > ask)
  • Empty order books

Performance Benchmarks

On a typical modern system (3.0GHz CPU):

File Size Messages Depth Processing Time
50KB 1,000 5 ~5ms
1MB 25,000 5 ~50ms
2MB 50,000 10 ~100ms

Throughput: ~500,000 messages/second

Validation

Expected output for the sample test is provided in output.log. The validation process:

# Generate test file
./build/stream_generator input.bin

# Run processor
cat input.bin | ./build/orderbook_processor 5 > actual.log

# Compare with expected
diff -w output.log actual.log

Troubleshooting

Issue: Different output than expected

  • Check: Ensure using little-endian system or byte-swap if needed
  • Verify: Test generator creates correct binary format
  • Debug: Enable debug builds with make debug

Issue: Performance slower than expected

  • Check: Compiler optimizations enabled (-O3)
  • Profile: Use profiling tools (gprof, perf)
  • Monitor: Watch memory usage with valgrind

Issue: Segmentation fault

  • Check: Input file format is correct
  • Verify: File isn't truncated or corrupted
  • Debug: Run with AddressSanitizer: g++ -fsanitize=address

Extensions & Future Work

Potential improvements:

  1. Threading: Parallel processing for multiple symbols
  2. Memory Pool: Custom allocator for Order structs
  3. SIMD: Vectorized price level aggregation
  4. Compression: Support compressed input streams
  5. Real-time: Direct network socket input

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published