Skip to content

algonomad571/MarketPulse

Repository files navigation

MarketPulse โ€” Low-Latency Market Data Ingestion & Replay System

A high-performance C++ market data processing system with real-time streaming, recording, replay capabilities, and a modern web-based monitoring dashboard.

๐Ÿš€ Features

Core Functionality

  • High-throughput ingestion: 150k+ messages/sec (L1/L2/Trade data)
  • Binary protocol: Compact, CRC-validated frames for data integrity
  • Custom TCP pub-sub: Topic-based routing with backpressure handling
  • Memory-mapped recording: Efficient persistent storage with indexing
  • Deterministic replay: 1x to 100x speed with precise timing
  • Real-time metrics: Latency histograms, throughput, queue depths

Architecture

  • Multi-threaded pipeline: Lock-free queues, work-stealing pools
  • Per-symbol ordering: Parallel processing while maintaining sequence
  • Configurable rates: Mock feed with burst simulation
  • WebSocket metrics: Real-time dashboard updates
  • RESTful control: Start/stop feeds, manage replay sessions

Performance Targets

  • Latency: P50 < 2ms, P99 < 10ms (ingestโ†’publish)
  • Throughput: โ‰ฅ150k L1 msg/s, โ‰ฅ60k L2 msg/s on 8-core system
  • Replay: Sustained 100x without backpressure violations

๐Ÿ— Architecture Overview

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Mock Feed   โ”‚โ”€โ”€โ”€โ–ถโ”‚ Normalizer   โ”‚โ”€โ”€โ”€โ–ถโ”‚ Publisher    โ”‚
โ”‚ (Poisson)   โ”‚    โ”‚ (N threads)  โ”‚    โ”‚ (TCP server) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                          โ”‚                     โ”‚
                          โ–ผ                     โ”‚
                   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”            โ”‚
                   โ”‚ Recorder     โ”‚            โ”‚
                   โ”‚ (mmap files) โ”‚            โ”‚
                   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜            โ”‚
                          โ”‚                     โ”‚
                          โ–ผ                     โ”‚
                   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”            โ”‚
                   โ”‚ Replayer     โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   โ”‚ (rate ctrl)  โ”‚
                   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Design Decisions & Trade-offs

Why Boost.Asio instead of raw epoll/kqueue?

  • Provides portable async I/O without sacrificing performance
  • Allows focus on pipeline and backpressure logic rather than OS-specific details
  • Still exposes low-level control over buffers and scheduling

Why lock-free queues (moodycamel) over mutex-based queues?

  • Avoids contention under high message rates
  • Predictable latency under bursty traffic
  • Reduced tail latency compared to std::queue + mutex

Why memory-mapped files for recording?

  • Sequential append with minimal syscall overhead
  • OS page cache handles buffering efficiently
  • Enables zero-copy reads during replay

Why custom binary protocol instead of Protobuf/FlatBuffers?

  • Fixed-size headers enable faster parsing
  • No schema negotiation overhead
  • Explicit control over alignment and endianness

๐Ÿ›  Technology Stack

Backend (C++)

  • Language: C++20 with CMake build system
  • Networking: Boost.Asio for async I/O
  • Concurrency: moodycamel::ConcurrentQueue (lock-free)
  • Logging: spdlog (async mode)
  • Serialization: Custom binary protocol
  • Storage: Memory-mapped files with indexing
  • Metrics: Custom histograms + Prometheus export

Frontend (Next.js)

  • Framework: Next.js 14 with TypeScript
  • Styling: Tailwind CSS with custom components
  • Charts: Recharts for real-time visualization
  • WebSocket: Live metrics streaming
  • Icons: Lucide React

Infrastructure

  • Containerization: Docker + Docker Compose
  • Monitoring: Prometheus + Grafana
  • Optional Analytics: ClickHouse integration
  • Development: Hot reload, health checks

๐Ÿ“ Project Structure

md-system-cpp/
โ”œโ”€โ”€ src/                          # C++ source code
โ”‚   โ”œโ”€โ”€ common/                   # Shared utilities
โ”‚   โ”‚   โ”œโ”€โ”€ frame.{hpp,cpp}       # Binary protocol
โ”‚   โ”‚   โ”œโ”€โ”€ symbol_registry.{hpp,cpp}
โ”‚   โ”‚   โ”œโ”€โ”€ config.{hpp,cpp}
โ”‚   โ”‚   โ””โ”€โ”€ metrics.{hpp,cpp}
โ”‚   โ”œโ”€โ”€ feed/                     # Data connectors
โ”‚   โ”‚   โ””โ”€โ”€ mock_feed.{hpp,cpp}
โ”‚   โ”œโ”€โ”€ normalize/                # Event processing
โ”‚   โ”œโ”€โ”€ publisher/                # TCP pub-sub server
โ”‚   โ”œโ”€โ”€ recorder/                 # Persistent storage
โ”‚   โ”œโ”€โ”€ replay/                   # Historical playback
โ”‚   โ”œโ”€โ”€ ctrl/                     # REST/WebSocket API
โ”‚   โ””โ”€โ”€ main_core.cpp             # Application entry
โ”œโ”€โ”€ ui/                           # Next.js dashboard
โ”‚   โ”œโ”€โ”€ app/                      # App router pages
โ”‚   โ”‚   โ”œโ”€โ”€ components/           # Reusable UI components
โ”‚   โ”‚   โ”œโ”€โ”€ hooks/                # Custom React hooks
โ”‚   โ”‚   โ””โ”€โ”€ (pages)/              # Route pages
โ”œโ”€โ”€ infra/                        # Infrastructure configs
โ”‚   โ”œโ”€โ”€ prometheus/
โ”‚   โ”œโ”€โ”€ grafana/
โ”‚   โ””โ”€โ”€ clickhouse/
โ”œโ”€โ”€ CMakeLists.txt                # Build configuration
โ”œโ”€โ”€ docker-compose.yml            # Full stack deployment
โ””โ”€โ”€ config.json                   # Runtime configuration

๐Ÿ“Š Binary Protocol

Frame Header (Little-Endian)

struct FrameHeader {
  uint32_t magic;     // 0x4D444146 ('MDAF')
  uint16_t version;   // 1
  uint16_t msg_type;  // 1=L1, 2=L2, 3=Trade, 4=Heartbeat
  uint32_t body_len;  // bytes of body
  uint32_t crc32;     // CRC32 of body
};

Message Bodies

struct L1Body {
  uint64_t ts_ns;
  uint32_t symbol_id;
  int64_t  bid_px, ask_px;     // scaled 1e-8
  uint64_t bid_sz, ask_sz;     // scaled 1e-8
  uint64_t seq;
};

struct L2Body {
  uint64_t ts_ns;
  uint32_t symbol_id;
  uint8_t  side;        // 0=Bid, 1=Ask
  uint8_t  action;      // 0=Insert, 1=Update, 2=Delete
  uint16_t level;       // 0=best
  int64_t  price;       // scaled 1e-8
  uint64_t size;        // scaled 1e-8
  uint64_t seq;
};

๐Ÿš€ Quick Start

Prerequisites

  • Docker and Docker Compose
  • C++20 compatible compiler (for local builds)
  • Node.js 18+ (for UI development)

Running the Complete System

  1. Clone and start all services:

    git clone <repository>
    cd md-system-cpp
    docker-compose up -d
  2. Access the interfaces:

  3. Start the mock feed:

    curl -X POST http://localhost:8080/feeds/start \
      -H "Content-Type: application/json" \
      -d '{"action":"start","l1_rate":50000,"l2_rate":30000,"trade_rate":5000}'

Local Development

  1. Build C++ core:

    mkdir build && cd build
    cmake .. -DCMAKE_BUILD_TYPE=Debug
    make -j$(nproc)
    ./md_core_main ../config.json
  2. Run UI in development mode:

    cd ui
    npm install
    npm run dev

๐ŸŽฎ Usage Examples

Subscribe to Live Data

# Connect to TCP pub-sub (port 9100)
telnet localhost 9100

# Send subscription (JSON control messages)
{"op":"auth","token":"devtoken123"}
{"op":"subscribe","topics":["l1.BTCUSDT","trade.*"],"lossless":false}

# Receive binary frames...

Start Replay Session

curl -X POST http://localhost:8080/replay/start \
  -H "Content-Type: application/json" \
  -d '{
    "action":"start",
    "from_ts_ns":1640995200000000000,
    "to_ts_ns":1640998800000000000,
    "rate":10.0,
    "topics":["l1.*"]
  }'

Monitor via WebSocket

const ws = new WebSocket('ws://localhost:8081/ws/metrics');
ws.onmessage = (event) => {
  const metrics = JSON.parse(event.data);
  console.log('P99 latency:', metrics.histograms.normalize_event_ns.p99, 'ns');
};

๐Ÿ“ˆ Performance Monitoring

Key Metrics

  • Throughput: Messages processed per second by type
  • Latency: P50/P95/P99 processing latencies
  • Queue Depths: Publisher and recorder backlogs
  • Connections: Active TCP subscribers
  • Errors: Processing failures and drops

Grafana Dashboards

Pre-configured dashboards include:

  • System Overview (throughput, latency, connections)
  • Pipeline Health (queue depths, error rates)
  • Performance Analysis (latency percentiles, burst handling)

๐Ÿ”ง Configuration

Runtime Configuration (config.json)

{
  "network": {
    "pubsub_port": 9100,
    "ctrl_http_port": 8080,
    "ws_metrics_port": 8081
  },
  "storage": {
    "dir": "./data",
    "roll_bytes": 2147483648,
    "index_interval": 10000
  },
  "pipeline": {
    "publisher_lanes": 8,
    "normalizer_threads": 4,
    "recorder_fsync_ms": 50
  },
  "feeds": {
    "default_symbols": ["BTCUSDT", "ETHUSDT", "SOLUSDT"],
    "mock_enabled": true
  }
}

Failure Handling & Guarantees

Backpressure

  • Publisher applies bounded queues per topic
  • When queues are full, producers slow down instead of dropping data

Crash Recovery

  • Memory-mapped files are flushed at configurable intervals
  • On restart, recorder replays last valid index
  • Partial frames are detected via CRC and discarded

Data Integrity

  • CRC32 validation on every frame
  • Corrupted frames are logged and skipped

Known Limitations

  • Single-node deployment (no replication)
  • TCP-based pub-sub assumes reliable connections
  • No exactly-once delivery across process restarts

๐Ÿงช Testing & Benchmarking

Unit Tests

cd build
make test
./tests/unit_tests

Performance Benchmarks

./benchmarks/pipeline_bench
# Expected: >150k msg/s throughput, <10ms P99 latency

Chaos Testing

# Test file integrity after crashes
./scripts/chaos_test.sh

Benchmarking Methodology

  • Benchmarks executed on an 8-core x86_64 machine
  • CPU pinning enabled to reduce scheduler noise
  • Latencies measured from ingest โ†’ publish using monotonic clocks
  • P99 computed over 60-second steady-state windows
  • Burst tests simulate Poisson arrivals with configurable rates

๐Ÿ“š Documentation

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make changes with tests
  4. Ensure benchmarks pass
  5. Submit a pull request

Development Standards

  • C++20 modern idioms
  • Lock-free where possible
  • Comprehensive error handling
  • Performance-first design
  • Clean architecture principles

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


Built for ultra-low latency market data processing with production-grade reliability.

About

Market Data Feed Handler + Replay System

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published