Skip to content

Chesterguan/PSDL

PSDL Logo

PSDL

Patient Scenario Definition Language

An Open Standard for Clinical Logic, Real-Time Monitoring & AI Integration

Tests Spec Version License PRs Welcome Python 3.8-3.12 zread

What SQL became for data queries, ONNX for ML models, and GraphQL for APIs —
PSDL is becoming the semantic layer for clinical AI.

📄 Read the Whitepaper: English · 简体中文 · Español · Français · 日本語


Accountable Clinical AI — Traceable by Design

Clinical AI doesn't fail because models are weak.
It fails because decisions cannot be traced.

PSDL makes every clinical decision accountable:

WHO
wrote this logic?
WHY
does it matter?
WHAT
evidence supports it?
audit.intent audit.rationale audit.provenance

This is not optional. This is what makes PSDL regulatory-ready (FDA, EU MDR).


Try It Now (No Setup Required)

Run PSDL in your browser with Google Colab - zero installation, real clinical data:

Notebook Data Description
Open In Colab Synthetic Quick demo with generated patient data (2 min)
Open In Colab MIMIC-IV Demo 100 real ICU patients, ICD diagnoses
Open In Colab PhysioNet Sepsis 40,000+ patients with labeled sepsis

The Problem

Despite significant advances in clinical AI and machine learning, real-time decision support in healthcare remains fragmented, non-portable, non-reproducible, and exceptionally difficult to audit or regulate.

PSDL Value Proposition - Before and After
PSDL bridges the determination, audit, and portability gaps in clinical research

What is PSDL?

PSDL (Patient Scenario Definition Language) is a declarative, vendor-neutral language for expressing clinical scenarios. It provides a structured way to define:

Component Description
Signals Time-series clinical data bindings (labs, vitals, etc.)
Trends Temporal computations over signals (deltas, slopes, averages)
Logic Boolean algebra combining trends into clinical states
Audit Required traceability (intent, rationale, provenance)
Population Criteria for which patients a scenario applies to
State Optional state machine for tracking clinical progression

How PSDL Works
Syntax vs Semantics vs Runtime - How PSDL Works

Quick Example

# Detect early kidney injury (v0.3 syntax)
scenario: AKI_Early_Detection
version: "0.3.0"

audit:
  intent: "Detect early acute kidney injury using creatinine trends"
  rationale: "Early AKI detection enables timely intervention"
  provenance: "KDIGO Clinical Practice Guideline for AKI (2012)"

signals:
  Cr:
    ref: creatinine        # v0.3: 'ref' instead of 'source'
    concept_id: 3016723    # OMOP concept
    unit: mg/dL

trends:
  # v0.3: Trends produce numeric values only
  cr_delta_6h:
    expr: delta(Cr, 6h)
    description: "Creatinine change over 6 hours"

  cr_current:
    expr: last(Cr)
    description: "Current creatinine value"

logic:
  # v0.3: Comparisons belong in logic layer
  cr_rising:
    when: cr_delta_6h > 0.3
    description: "Rising creatinine (> 0.3 mg/dL in 6h)"

  cr_high:
    when: cr_current > 1.5
    description: "Elevated creatinine"

  aki_risk:
    when: cr_rising AND cr_high
    severity: high
    description: "Early AKI - rising and elevated creatinine"

Why PSDL?

Challenge Without PSDL With PSDL
Portability Logic tied to specific hospital systems Same scenario runs anywhere with mapping
Auditability Scattered across Python, SQL, configs Single structured, version-controlled file
Reproducibility Hidden state, implicit dependencies Deterministic execution, explicit semantics
Regulatory Compliance Manual documentation Built-in audit primitives
Research Sharing Cannot validate published scenarios Portable, executable definitions

Installation

# Install from PyPI
pip install psdl-lang

# With OMOP adapter support
pip install psdl-lang[omop]

# With FHIR adapter support
pip install psdl-lang[fhir]

# Full installation (all adapters)
pip install psdl-lang[full]

Usage

Parse a Scenario

from psdl.core import parse_scenario

# Use bundled scenarios (included with pip install)
from psdl.examples import get_scenario, list_scenarios

# List available scenarios
print(list_scenarios())  # ['aki_detection', 'sepsis_screening', ...]

# Load a bundled scenario
scenario = get_scenario("aki_detection")

print(f"Scenario: {scenario.name}")
print(f"Signals: {list(scenario.signals.keys())}")
print(f"Logic rules: {list(scenario.logic.keys())}")

Evaluate Against Patient Data

from psdl.examples import get_scenario
from psdl.runtimes.single import SinglePatientEvaluator, InMemoryBackend
from datetime import datetime, timedelta

# Load bundled scenario
scenario = get_scenario("aki_detection")

# Set up data backend
backend = InMemoryBackend()
now = datetime.now()

# Add patient data (using convenience method)
backend.add_observation(123, "Cr", 1.0, now - timedelta(hours=6))
backend.add_observation(123, "Cr", 1.3, now - timedelta(hours=3))
backend.add_observation(123, "Cr", 1.8, now)

# Evaluate
evaluator = SinglePatientEvaluator(scenario, backend)
result = evaluator.evaluate(patient_id=123, reference_time=now)

if result.is_triggered:
    print(f"Patient triggered: {result.triggered_logic}")
    print(f"Trend values: {result.trend_values}")

Compile for Production (v0.3)

For production deployments requiring audit trails, use compile_scenario to get a compiled IR with cryptographic hashes:

from psdl.core.compile import compile_scenario
from psdl.runtimes.single import SinglePatientEvaluator

# Compile scenario to IR with audit hashes
ir = compile_scenario("scenario.yaml")

print(f"Spec Hash: {ir.spec_hash}")       # Hash of input YAML
print(f"IR Hash: {ir.ir_hash}")           # Hash of compiled IR
print(f"Toolchain: {ir.toolchain_hash}")  # Hash of compiler version

# Create evaluator from compiled IR
evaluator = SinglePatientEvaluator.from_ir(ir, backend)
result = evaluator.evaluate(patient_id=123, reference_time=now)

# Results include compilation hashes for audit
print(f"Compilation Hashes: {result.compilation_hashes}")

The compiled IR includes:

  • DAG-ordered evaluation - Dependencies computed once, evaluated in correct order
  • Canonical hashing - Reproducible SHA-256 hashes per spec/hashing.yaml
  • Compilation diagnostics - Warnings for unused signals/trends
  • Audit trail - Full traceability from YAML to evaluation

Dataset Specifications (RFC-0004)

Dataset Specs enable portable scenarios by mapping semantic signal references to physical data locations:

from psdl import load_dataset_spec

# Load institution-specific binding
spec = load_dataset_spec("dataset_specs/my_hospital_omop.yaml")

# Resolve a signal reference to physical binding
binding = spec.resolve("creatinine")
print(binding.table)        # "measurement"
print(binding.filter_expr)  # "concept_id = 3016723"

This separates clinical logic (portable scenarios) from local terminology (institution-specific mappings).

Temporal Operators

Operator Syntax Description
delta delta(signal, window) Absolute change over window
slope slope(signal, window) Linear regression slope
ema ema(signal, window) Exponential moving average
sma sma(signal, window) Simple moving average
min min(signal, window) Minimum value in window
max max(signal, window) Maximum value in window
count count(signal, window) Observation count
last last(signal) Most recent value

Window Formats

  • 30s - 30 seconds
  • 5m - 5 minutes
  • 6h - 6 hours
  • 1d - 1 day
  • 7d - 7 days

Project Structure

PSDL follows industry-standard patterns (like GraphQL, CQL, ONNX):

  • Specification defines WHAT
  • Reference Implementation shows HOW.
psdl/
├── README.md              # This file
├── PRINCIPLES.md          # Core laws defining PSDL scope
├── spec/                  # SPECIFICATION (Source of Truth)
│   ├── schema.json        # JSON Schema for scenarios
│   ├── operators.yaml     # Operator definitions
│   └── grammar/           # Lark/EBNF grammars
├── src/psdl/              # REFERENCE IMPLEMENTATION (Python)
│   ├── __init__.py        # Package entry point
│   ├── operators.py       # Temporal operators (shared)
│   ├── core/              # Core module (parsing, IR) - v0.3 strict mode
│   │   ├── parser.py      # YAML parser
│   │   └── ir.py          # Intermediate representation
│   ├── runtimes/          # Execution runtimes
│   │   ├── single/        # Single patient evaluation
│   │   └── cohort/        # Cohort SQL compilation
│   ├── adapters/          # Data adapters (OMOP, FHIR, PhysioNet)
│   └── examples/          # Bundled scenarios (7 scenarios)
├── examples/              # Demo content (not in package)
│   ├── notebooks/         # Jupyter demos (5 notebooks, Colab-ready)
│   └── data/              # Sample data (compressed archives)
├── docs/                  # Documentation + Whitepapers
├── rfcs/                  # Design proposals (5 RFCs)
└── tests/                 # 369 tests (all passing)
Component Description
Specification PSDL language definition (see WHITEPAPER.md)
Reference Implementation Python implementation demonstrating the spec
Core Parser, IR types, expression parsing
Runtimes Single patient, cohort SQL, streaming execution
Adapters Data sources (OMOP, FHIR, PhysioNet)

Spec-Driven Code Generation

PSDL follows a spec-driven architecture where specification files are the single source of truth. Code is auto-generated from specs to eliminate redundancy and ensure consistency.

┌─────────────────────────────────────────────────────────────────┐
│                   SPEC FILES (Source of Truth)                  │
├─────────────────────────────────────────────────────────────────┤
│ spec/schema.json        → Scenario YAML structure               │
│ spec/ast-nodes.yaml     → Expression AST types + grammar mappings│
│ spec/operators.yaml     → Operator metadata + SQL templates     │
│ spec/grammar/*.lark     → Expression grammar (Lark)             │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼ python tools/codegen.py --all
┌─────────────────────────────────────────────────────────────────┐
│                    AUTO-GENERATED CODE                          │
├─────────────────────────────────────────────────────────────────┤
│ _generated/schema_types.py   ← schema.json (datamodel-codegen)  │
│ _generated/ast_types.py      ← ast-nodes.yaml (Jinja2)          │
│ _generated/transformer.py    ← ast-nodes.yaml grammar_mappings  │
│ _generated/operators_meta.py ← operators.yaml (Jinja2)          │
│ _generated/sql_templates.py  ← operators.yaml (Jinja2)          │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    MANUAL CODE (Algorithms Only)                │
├─────────────────────────────────────────────────────────────────┤
│ operators.py          → Temporal operator implementations       │
│ runtimes/             → Execution engines (single, cohort)      │
│ adapters/             → Data source adapters (OMOP, FHIR)       │
└─────────────────────────────────────────────────────────────────┘

Regenerating Code

# Regenerate all code from specs
python tools/codegen.py --all

# Regenerate specific components
python tools/codegen.py --types        # Python types from schema.json
python tools/codegen.py --ast          # AST types from ast-nodes.yaml
python tools/codegen.py --transformer  # Lark transformer from grammar_mappings
python tools/codegen.py --operators    # Operator metadata
python tools/codegen.py --sql          # SQL templates

# Validate implementations against spec
python tools/codegen.py --validate

Why Spec-Driven?

Benefit Description
Single Source of Truth Specs define types once; code is generated
Consistency No manual type mismatches or drift
Maintainability Change spec → regenerate → done
Auditability Clear traceability from spec to implementation

Running Tests

# Run all tests
pytest tests/ -v

# Run with verbose output
pytest tests/ -v -s

Test Coverage: 424 Tests (All Passing)

  • Unit Tests: Parser, evaluator, operators, scenarios
  • Integration Tests: FHIR adapter, OMOP backend, PhysioNet adapter
  • Validation: SQL equivalence (100% match), KDIGO clinical guidelines
  • Streaming Tests: Window functions, logic evaluation, Flink compiler
  • Cohort Tests: SQL compilation, batch evaluation
  • Compiler Tests: ScenarioIR, DAG ordering, canonical hashing

See tests/TEST_VALIDATION.md for detailed methodology.

Example Scenarios

Scenario Description Clinical Use
ICU Deterioration Monitors for early signs of clinical deterioration Kidney function, lactate trends, hemodynamics
AKI Detection KDIGO criteria for Acute Kidney Injury staging Creatinine-based staging
Sepsis Screening qSOFA + lactate-based sepsis screening Early sepsis identification
PhysioNet Sepsis Sepsis-3 criteria for PhysioNet Challenge 2019 SIRS + organ dysfunction

Design Principles

The Core Law: PSDL defines WHAT to detect, not HOW to collect or execute.

For the full set of laws governing PSDL's scope, see PRINCIPLES.md

Principle Description
Declarative Define what to detect, not how to compute it
Portable Same scenario runs on any OMOP/FHIR backend with mapping
Auditable Structured format enables static analysis and version control
Deterministic Predictable execution with no hidden state
Open Vendor-neutral, community-governed

Roadmap

Phase Status Focus
Phase 1: Semantic Foundation ✅ Complete Spec, parser, operators, OMOP/FHIR adapters
Phase 2: v0.3 Architecture ✅ Complete Signal/Trend/Logic/Output separation, PyPI publication
Phase 3: Production Readiness 🚧 Current Output profiles, streaming, performance
Phase 4: Adoption & Scale 🔮 Future Hospital pilots, standards engagement

📍 View Full Roadmap →

Related Standards

Standard Relationship
OMOP CDM Data model for signals (concept_id references)
FHIR R4 EHR integration (implemented adapter)
CQL Similar domain, different scope (quality measures)
ONNX Inspiration for portable format approach

Documentation

Document Description
API Reference Developer API documentation
Principles The laws defining PSDL's scope and boundaries
Whitepaper Full project vision and specification (5 languages)
Getting Started Quick start guide
Roadmap Development phases and timeline
Changelog Version history

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Ways to Contribute

  • Specification: Propose language features, operators, semantics
  • Implementation: Build runtimes, backends, tooling
  • Documentation: Improve guides, tutorials, examples
  • Testing: Add conformance tests, find edge cases
  • Adoption: Share use cases, pilot experiences

License

Apache 2.0 - See LICENSE for details.


Clinical AI doesn't fail because models are weak.
It fails because there's no semantic layer to express clinical logic portably.

PSDL is the semantic layer for clinical AI — like SQL for databases.

An open standard built by the community, for the community.

About

Patient Scenario Definition Language

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published

Contributors 2

  •  
  •