PSDL

Patient Scenario Definition Language

An Open Standard for Clinical Logic, Real-Time Monitoring & AI Integration

What SQL became for data queries, ONNX for ML models, and GraphQL for APIs —
PSDL is becoming the semantic layer for clinical AI.

📄 Read the Whitepaper: English · 简体中文 · Español · Français · 日本語

Accountable Clinical AI — Traceable by Design

Clinical AI doesn't fail because models are weak.
It fails because decisions cannot be traced.

PSDL makes every clinical decision accountable:

WHO wrote this logic?	WHY does it matter?	WHAT evidence supports it?
`audit.intent`	`audit.rationale`	`audit.provenance`

This is not optional. This is what makes PSDL regulatory-ready (FDA, EU MDR).

Try It Now (No Setup Required)

Run PSDL in your browser with Google Colab - zero installation, real clinical data:

Notebook	Data	Description
	Synthetic	Quick demo with generated patient data (2 min)
	MIMIC-IV Demo	100 real ICU patients, ICD diagnoses
	PhysioNet Sepsis	40,000+ patients with labeled sepsis

The Problem

Despite significant advances in clinical AI and machine learning, real-time decision support in healthcare remains fragmented, non-portable, non-reproducible, and exceptionally difficult to audit or regulate.

PSDL bridges the determination, audit, and portability gaps in clinical research

What is PSDL?

PSDL (Patient Scenario Definition Language) is a declarative, vendor-neutral language for expressing clinical scenarios. It provides a structured way to define:

Component	Description
Signals	Time-series clinical data bindings (labs, vitals, etc.)
Trends	Temporal computations over signals (deltas, slopes, averages)
Logic	Boolean algebra combining trends into clinical states
Audit	Required traceability (intent, rationale, provenance)
Population	Criteria for which patients a scenario applies to
State	Optional state machine for tracking clinical progression

Syntax vs Semantics vs Runtime - How PSDL Works

Quick Example

# Detect early kidney injury (v0.3 syntax)
scenario: AKI_Early_Detection
version: "0.3.0"

audit:
  intent: "Detect early acute kidney injury using creatinine trends"
  rationale: "Early AKI detection enables timely intervention"
  provenance: "KDIGO Clinical Practice Guideline for AKI (2012)"

signals:
  Cr:
    ref: creatinine        # v0.3: 'ref' instead of 'source'
    concept_id: 3016723    # OMOP concept
    unit: mg/dL

trends:
  # v0.3: Trends produce numeric values only
  cr_delta_6h:
    expr: delta(Cr, 6h)
    description: "Creatinine change over 6 hours"

  cr_current:
    expr: last(Cr)
    description: "Current creatinine value"

logic:
  # v0.3: Comparisons belong in logic layer
  cr_rising:
    when: cr_delta_6h > 0.3
    description: "Rising creatinine (> 0.3 mg/dL in 6h)"

  cr_high:
    when: cr_current > 1.5
    description: "Elevated creatinine"

  aki_risk:
    when: cr_rising AND cr_high
    severity: high
    description: "Early AKI - rising and elevated creatinine"

Why PSDL?

Challenge	Without PSDL	With PSDL
Portability	Logic tied to specific hospital systems	Same scenario runs anywhere with mapping
Auditability	Scattered across Python, SQL, configs	Single structured, version-controlled file
Reproducibility	Hidden state, implicit dependencies	Deterministic execution, explicit semantics
Regulatory Compliance	Manual documentation	Built-in audit primitives
Research Sharing	Cannot validate published scenarios	Portable, executable definitions

Installation

# Install from PyPI
pip install psdl-lang

# With OMOP adapter support
pip install psdl-lang[omop]

# With FHIR adapter support
pip install psdl-lang[fhir]

# Full installation (all adapters)
pip install psdl-lang[full]

Usage

Parse a Scenario

from psdl.core import parse_scenario

# Use bundled scenarios (included with pip install)
from psdl.examples import get_scenario, list_scenarios

# List available scenarios
print(list_scenarios())  # ['aki_detection', 'sepsis_screening', ...]

# Load a bundled scenario
scenario = get_scenario("aki_detection")

print(f"Scenario: {scenario.name}")
print(f"Signals: {list(scenario.signals.keys())}")
print(f"Logic rules: {list(scenario.logic.keys())}")

Evaluate Against Patient Data

from psdl.examples import get_scenario
from psdl.runtimes.single import SinglePatientEvaluator, InMemoryBackend
from datetime import datetime, timedelta

# Load bundled scenario
scenario = get_scenario("aki_detection")

# Set up data backend
backend = InMemoryBackend()
now = datetime.now()

# Add patient data (using convenience method)
backend.add_observation(123, "Cr", 1.0, now - timedelta(hours=6))
backend.add_observation(123, "Cr", 1.3, now - timedelta(hours=3))
backend.add_observation(123, "Cr", 1.8, now)

# Evaluate
evaluator = SinglePatientEvaluator(scenario, backend)
result = evaluator.evaluate(patient_id=123, reference_time=now)

if result.is_triggered:
    print(f"Patient triggered: {result.triggered_logic}")
    print(f"Trend values: {result.trend_values}")

Compile for Production (v0.3)

For production deployments requiring audit trails, use compile_scenario to get a compiled IR with cryptographic hashes:

from psdl.core.compile import compile_scenario
from psdl.runtimes.single import SinglePatientEvaluator

# Compile scenario to IR with audit hashes
ir = compile_scenario("scenario.yaml")

print(f"Spec Hash: {ir.spec_hash}")       # Hash of input YAML
print(f"IR Hash: {ir.ir_hash}")           # Hash of compiled IR
print(f"Toolchain: {ir.toolchain_hash}")  # Hash of compiler version

# Create evaluator from compiled IR
evaluator = SinglePatientEvaluator.from_ir(ir, backend)
result = evaluator.evaluate(patient_id=123, reference_time=now)

# Results include compilation hashes for audit
print(f"Compilation Hashes: {result.compilation_hashes}")

The compiled IR includes:

DAG-ordered evaluation - Dependencies computed once, evaluated in correct order
Canonical hashing - Reproducible SHA-256 hashes per spec/hashing.yaml
Compilation diagnostics - Warnings for unused signals/trends
Audit trail - Full traceability from YAML to evaluation

Dataset Specifications (RFC-0004)

Dataset Specs enable portable scenarios by mapping semantic signal references to physical data locations:

from psdl import load_dataset_spec

# Load institution-specific binding
spec = load_dataset_spec("dataset_specs/my_hospital_omop.yaml")

# Resolve a signal reference to physical binding
binding = spec.resolve("creatinine")
print(binding.table)        # "measurement"
print(binding.filter_expr)  # "concept_id = 3016723"

This separates clinical logic (portable scenarios) from local terminology (institution-specific mappings).

Temporal Operators

Operator	Syntax	Description
`delta`	`delta(signal, window)`	Absolute change over window
`slope`	`slope(signal, window)`	Linear regression slope
`ema`	`ema(signal, window)`	Exponential moving average
`sma`	`sma(signal, window)`	Simple moving average
`min`	`min(signal, window)`	Minimum value in window
`max`	`max(signal, window)`	Maximum value in window
`count`	`count(signal, window)`	Observation count
`last`	`last(signal)`	Most recent value

Window Formats

30s - 30 seconds
5m - 5 minutes
6h - 6 hours
1d - 1 day
7d - 7 days

Project Structure

PSDL follows industry-standard patterns (like GraphQL, CQL, ONNX):

Specification defines WHAT
Reference Implementation shows HOW.

psdl/
├── README.md              # This file
├── PRINCIPLES.md          # Core laws defining PSDL scope
├── spec/                  # SPECIFICATION (Source of Truth)
│   ├── schema.json        # JSON Schema for scenarios
│   ├── operators.yaml     # Operator definitions
│   └── grammar/           # Lark/EBNF grammars
├── src/psdl/              # REFERENCE IMPLEMENTATION (Python)
│   ├── __init__.py        # Package entry point
│   ├── operators.py       # Temporal operators (shared)
│   ├── core/              # Core module (parsing, IR) - v0.3 strict mode
│   │   ├── parser.py      # YAML parser
│   │   └── ir.py          # Intermediate representation
│   ├── runtimes/          # Execution runtimes
│   │   ├── single/        # Single patient evaluation
│   │   └── cohort/        # Cohort SQL compilation
│   ├── adapters/          # Data adapters (OMOP, FHIR, PhysioNet)
│   └── examples/          # Bundled scenarios (7 scenarios)
├── examples/              # Demo content (not in package)
│   ├── notebooks/         # Jupyter demos (5 notebooks, Colab-ready)
│   └── data/              # Sample data (compressed archives)
├── docs/                  # Documentation + Whitepapers
├── rfcs/                  # Design proposals (5 RFCs)
└── tests/                 # 369 tests (all passing)

Component	Description
Specification	PSDL language definition (see WHITEPAPER.md)
Reference Implementation	Python implementation demonstrating the spec
Core	Parser, IR types, expression parsing
Runtimes	Single patient, cohort SQL, streaming execution
Adapters	Data sources (OMOP, FHIR, PhysioNet)

Spec-Driven Code Generation

PSDL follows a spec-driven architecture where specification files are the single source of truth. Code is auto-generated from specs to eliminate redundancy and ensure consistency.

┌─────────────────────────────────────────────────────────────────┐
│                   SPEC FILES (Source of Truth)                  │
├─────────────────────────────────────────────────────────────────┤
│ spec/schema.json        → Scenario YAML structure               │
│ spec/ast-nodes.yaml     → Expression AST types + grammar mappings│
│ spec/operators.yaml     → Operator metadata + SQL templates     │
│ spec/grammar/*.lark     → Expression grammar (Lark)             │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼ python tools/codegen.py --all
┌─────────────────────────────────────────────────────────────────┐
│                    AUTO-GENERATED CODE                          │
├─────────────────────────────────────────────────────────────────┤
│ _generated/schema_types.py   ← schema.json (datamodel-codegen)  │
│ _generated/ast_types.py      ← ast-nodes.yaml (Jinja2)          │
│ _generated/transformer.py    ← ast-nodes.yaml grammar_mappings  │
│ _generated/operators_meta.py ← operators.yaml (Jinja2)          │
│ _generated/sql_templates.py  ← operators.yaml (Jinja2)          │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    MANUAL CODE (Algorithms Only)                │
├─────────────────────────────────────────────────────────────────┤
│ operators.py          → Temporal operator implementations       │
│ runtimes/             → Execution engines (single, cohort)      │
│ adapters/             → Data source adapters (OMOP, FHIR)       │
└─────────────────────────────────────────────────────────────────┘

Regenerating Code

# Regenerate all code from specs
python tools/codegen.py --all

# Regenerate specific components
python tools/codegen.py --types        # Python types from schema.json
python tools/codegen.py --ast          # AST types from ast-nodes.yaml
python tools/codegen.py --transformer  # Lark transformer from grammar_mappings
python tools/codegen.py --operators    # Operator metadata
python tools/codegen.py --sql          # SQL templates

# Validate implementations against spec
python tools/codegen.py --validate

Why Spec-Driven?

Benefit	Description
Single Source of Truth	Specs define types once; code is generated
Consistency	No manual type mismatches or drift
Maintainability	Change spec → regenerate → done
Auditability	Clear traceability from spec to implementation

Running Tests

# Run all tests
pytest tests/ -v

# Run with verbose output
pytest tests/ -v -s

Test Coverage: 424 Tests (All Passing)

Unit Tests: Parser, evaluator, operators, scenarios
Integration Tests: FHIR adapter, OMOP backend, PhysioNet adapter
Validation: SQL equivalence (100% match), KDIGO clinical guidelines
Streaming Tests: Window functions, logic evaluation, Flink compiler
Cohort Tests: SQL compilation, batch evaluation
Compiler Tests: ScenarioIR, DAG ordering, canonical hashing

See tests/TEST_VALIDATION.md for detailed methodology.

Example Scenarios

Scenario	Description	Clinical Use
ICU Deterioration	Monitors for early signs of clinical deterioration	Kidney function, lactate trends, hemodynamics
AKI Detection	KDIGO criteria for Acute Kidney Injury staging	Creatinine-based staging
Sepsis Screening	qSOFA + lactate-based sepsis screening	Early sepsis identification
PhysioNet Sepsis	Sepsis-3 criteria for PhysioNet Challenge 2019	SIRS + organ dysfunction

Design Principles

The Core Law: PSDL defines WHAT to detect, not HOW to collect or execute.

For the full set of laws governing PSDL's scope, see PRINCIPLES.md

Principle	Description
Declarative	Define what to detect, not how to compute it
Portable	Same scenario runs on any OMOP/FHIR backend with mapping
Auditable	Structured format enables static analysis and version control
Deterministic	Predictable execution with no hidden state
Open	Vendor-neutral, community-governed

Roadmap

Phase	Status	Focus
Phase 1: Semantic Foundation	✅ Complete	Spec, parser, operators, OMOP/FHIR adapters
Phase 2: v0.3 Architecture	✅ Complete	Signal/Trend/Logic/Output separation, PyPI publication
Phase 3: Production Readiness	🚧 Current	Output profiles, streaming, performance
Phase 4: Adoption & Scale	🔮 Future	Hospital pilots, standards engagement

📍 View Full Roadmap →

Related Standards

Standard	Relationship
OMOP CDM	Data model for signals (concept_id references)
FHIR R4	EHR integration (implemented adapter)
CQL	Similar domain, different scope (quality measures)
ONNX	Inspiration for portable format approach

Documentation

Document	Description
API Reference	Developer API documentation
Principles	The laws defining PSDL's scope and boundaries
Whitepaper	Full project vision and specification (5 languages)
Getting Started	Quick start guide
Roadmap	Development phases and timeline
Changelog	Version history

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Ways to Contribute

Specification: Propose language features, operators, semantics
Implementation: Build runtimes, backends, tooling
Documentation: Improve guides, tutorials, examples
Testing: Add conformance tests, find edge cases
Adoption: Share use cases, pilot experiences

License

Apache 2.0 - See LICENSE for details.

Clinical AI doesn't fail because models are weak.
It fails because there's no semantic layer to express clinical logic portably.

PSDL is the semantic layer for clinical AI — like SQL for databases.

_{An open standard built by the community, for the community.}

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.github		.github
dataset_specs		dataset_specs
docs		docs
examples		examples
mappings		mappings
rfcs		rfcs
scripts		scripts
spec		spec
src/psdl		src/psdl
tests		tests
tools		tools
wiki		wiki
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
PRINCIPLES.md		PRINCIPLES.md
README.md		README.md
README_PYPI.md		README_PYPI.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Uh oh!

License

Chesterguan/PSDL

Folders and files

Latest commit

History

Repository files navigation

PSDL

Patient Scenario Definition Language

Accountable Clinical AI — Traceable by Design

Try It Now (No Setup Required)

The Problem

What is PSDL?

Quick Example

Why PSDL?

Installation

Usage

Parse a Scenario

Evaluate Against Patient Data

Compile for Production (v0.3)

Dataset Specifications (RFC-0004)

Temporal Operators

Window Formats

Project Structure

Spec-Driven Code Generation

Regenerating Code

Why Spec-Driven?

Running Tests

Test Coverage: 424 Tests (All Passing)

Example Scenarios

Design Principles

Roadmap

Related Standards

Documentation

Contributing

Ways to Contribute

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Contributors 2

Uh oh!

Languages

Packages