Skip to content

[RFC-0002] Streaming Execution with Apache Flink #1

@Chesterguan

Description

@Chesterguan

Summary

Implement streaming execution backend for PSDL using Apache Flink (PyFlink) as proposed in RFC-0002.

RFC Reference

See rfcs/0002-streaming-execution.md for full specification.

Current Status: ✅ Phase 1 Complete

Phase 1: Core Streaming (v0.2.0) - COMPLETE

  • Project structure (src/psdl/runtimes/streaming/)
  • PSDL to Flink compiler infrastructure
  • Operator compilation: delta, slope, ema, last, min, max, count, sma
  • Configurable slide intervals
  • KeyedStream by patient_id
  • Logic join functions (CoProcessFunction)
  • Basic streaming models and evaluator
  • Unit tests (test_streaming.py - all passing)

Phase 2: Production Readiness (v0.3.x) - PLANNED

  • Kafka source/sink connectors
  • FHIR subscription source
  • Trigger/CEP compilation
  • State TTL management
  • Checkpointing configuration
  • Monitoring/metrics
  • Kubernetes deployment
  • Documentation

Phase 3: Advanced Features (v0.4.x) - FUTURE

  • Savepoint management
  • Schema evolution support
  • Multi-patient logic (outbreak detection)
  • Performance benchmarks

Key Design Decisions

  1. Runtime: PyFlink (consistency with Python codebase)
  2. Slide intervals: Configurable per operator
  3. Late data: Default max_lateness: 5m
  4. Architecture: Follows RFC-0003 modular structure

Related

  • RFC-0003: Architecture Refactor (defines runtime structure)
  • RFC-0005: PSDL v0.3 Architecture (current spec)

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions