Skip to content
Edwin van Stein edited this page Dec 11, 2025 · 26 revisions

Caution

This is a work in progress!

AC/DC Model

Principles

Principle Classification Framework

Principles are classified into 5 categories based on their primary concern:

Category Focus Priority Level
A: Architectural Structure, layers, dependencies CRITICAL
B: Data & Analysis Integrity Reproducibility, traceability, quality CRITICAL
C: Standards Compliance CDISC, regulatory, ICH CRITICAL
D: Design Philosophy Guiding values and approaches FOUNDATIONAL
E: Operational Tools, implementation, usage SUPPORTING

Category A: Architectural Principles

These principles define the fundamental structure and organization of the AC/DC model.


A1: Layered Architecture with Unidirectional Dependencies

Statement: The AC/DC model SHALL maintain strict separation between layers with unidirectional dependency flow.

Three-Layer Structure:

  • Concepts: Abstract definitions of biomedical, derivation, and analysis concepts
  • Structures: Concrete data organization
  • Implementation: Transformations and presentations

Dependency Flow:

Implementations → Structures → Concepts
           (depend_on)  (depend_on)

Rules:

  1. Concepts SHALL NOT reference structures or implementations
  2. Structures SHALL ONLY reference concepts, NOT implementations
  3. Implementations MAY reference both structures and concepts
  4. Elements must belong to exactly one layer

Rationale:

  • Clear separation of concerns
  • Independent verification of each layer
  • Reusability across analyses
  • No circular dependencies
  • Maintainability

Priority: CRITICAL - This is the foundational architectural principle

Applicability: Universal - applies to all model elements


A2: Concept Independence from Study Context

Statement: Concepts SHALL be defined independent of specific studies, data standards, and implementation details.

Requirements:

  • Concepts describe real-world entities abstractly
  • Not context-dependent
  • Human-readable
  • Independent of SDTM, ADaM, or other standards

Contrast with Structures and Implementation:

  • Structures and Implementation are context-dependent
  • Machine-readable and executable
  • Instantiated in specific data standards

Rationale:

  • Reusability across studies
  • Conceptual clarity
  • Standards-agnostic reasoning

Priority: CRITICAL - Core to the conceptual/implementation separation

Applicability: Applies to all concept definitions


A3: Implementation Linkage to Standards

Statement: Every implementation SHALL link to concrete data standards (SDTM, ADaM) through structures and be machine-executable.

Requirements:

  • Implementations are representations of concepts in code
  • Must be machine-readable
  • Require concrete mappings
  • Must be executable by computational systems

Rationale:

  • Automation capability
  • Reduced transcription errors
  • Validation automation
  • Reuse of code

Priority: CRITICAL - Enables automation

Applicability: Applies to all implementation elements


A4: Clear Separation Between Analysis and Derivation

Statement: The model SHALL distinguish between subject-level derivations and non-subject-level analyses.

Definitions:

  • Derivation: Subject-level data handling to generate new subject-level data
  • Analysis: Creation of non-subject-level aggregated data

Rationale:

  • Different purposes require different approaches
  • Subject-level vs aggregate operations
  • Traceability requirements differ

Priority: HIGH - Important for scope clarity

Applicability: Applies to all methods and operations


A5: Cube-Based Data Organization

Statement: Data SHALL be organized as multi-dimensional cubes with dimensions, measures, and attributes.

Structure:

  • Dimensions: Identify and organize observations (subject, treatment, visit)
  • Measures: Quantitative or qualitative values being analyzed
  • Attributes: Qualifying metadata

Rationale:

  • Natural representation of clinical trial data
  • Supports OLAP-style operations
  • Aligns with ADaM BDS structure
  • Enables flexible slicing and aggregation

Priority: HIGH - Core structural pattern

Applicability: Applies to all data structures


A6: Slice-Based Subsetting

Statement: Data subsets SHALL be defined declaratively as slices by fixing dimension values or applying filters.

Requirements:

  • Slices are immutable views
  • Created by fixing one or more dimension values
  • Can represent subsets of records, variables, or both
  • May exist without methods (for analysis input)

Rationale:

  • Declarative specification
  • Reusability
  • Clear provenance
  • Separation of data selection from computation

Priority: HIGH - Fundamental operation

Applicability: Applies to all data subsetting


A7: Method Input/Output/Argument Structure

Statement: All methods SHALL explicitly declare inputs, outputs, and arguments.

Requirements:

  • Methods transform inputs to outputs
  • Arguments parameterize behavior
  • Inputs must be declared
  • Outputs must be declared
  • Arguments must be specified

Rationale:

  • Explicit interface definition
  • Type checking
  • Dependency analysis
  • Reproducibility

Priority: HIGH - Enables automated analysis

Applicability: Applies to all methods (derivation and analysis)


A8: Universal Connector Architecture

Note

PROPOSED by KWL - Pending review and approval. See: model/semantic-define-json-approach/WIKI_PROPOSAL_UNIVERSAL_CONNECTOR.md

Statement: DataConcepts SHALL serve as the universal abstraction layer connecting analytical structures to external domain models.

Requirements:

  • Cube elements (Dimensions, Measures, Attributes) SHALL reference DataConcepts via is_a relationships, NOT domain-specific variables directly
  • DataConcepts SHALL support simultaneous mappings to multiple external models (ADaM, SDTM, USDM, ARS, or proprietary)
  • Adding support for a new domain model SHALL NOT require changes to existing cube structures or templates
  • Templates SHALL be expressed in terms of DataConcepts, enabling portability across organizations
  • Execution engines SHALL resolve DataConcept references to domain-specific variables through explicit mapping declarations
  • The same DataConcept MAY map to different representations in different domain models

Architecture Diagram:

┌───────────┐   ┌───────────┐   ┌───────────┐   ┌───────────┐
│   ADaM    │   │   SDTM    │   │   USDM    │   │    ARS    │
│  USUBJID  │   │  SUBJID   │   │  Subject  │   │  Result   │
│   CHG     │   │           │   │           │   │           │
└───────────┘   └───────────┘   └───────────┘   └───────────┘
     EXTERNAL DOMAIN MODELS (Pluggable)
        ▲               ▲                ▲                ▲
        │               │                │                │
        │          maps_to          maps_to          maps_to
        │               │                │                │
┌─────────────────────────────────────────────────────────────────┐
│                      DATA CONCEPTS                              │
│              (Universal Abstraction Layer)                      │
│                                                                 │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  DataConcept: "subject"                                  │  │
│  │    description: "Study participant"                      │  │
│  │    mappings:                                             │  │
│  │      adam_variable: USUBJID                              │  │
│  │      sdtm_variable: SUBJID                               │  │
│  │      usdm_element: StudySubject                          │  │
│  └──────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
        ▲               ▲                ▲
        │               │                │
        │          ┌────┴────┐      ┌────┴────┐
        │          │  is_a   │      │  is_a   │
        │          └─────────┘      └─────────┘
        │               │                │
┌───────┼───────────────┼────────────────┼────────────────────────┐
│       │               │                │                        │
│  ┌────┴────┐    ┌─────┴─────┐    ┌─────┴─────┐                 │
│  │  Cube   │    │ Dimension │    │  Measure  │                 │
│  └─────────┘    └───────────┘    └───────────┘                 │
│                    ANALYTICAL STRUCTURES                        │
└─────────────────────────────────────────────────────────────────┘

Rationale:

  1. Provides the mechanism (HOW) for achieving C1 and C4
  2. Enables portable templates
  3. Supports multi-standard compliance
  4. Future-proofs against new standards
  5. Decouples analytical logic from data standard specifics

Relationship to Other Principles:

Principle Relationship
A1: Layered Architecture DataConcepts ARE the Concepts layer; this principle specifies their connector role
A2: Concept Independence This principle explains HOW concepts remain independent (via abstraction layer)
C1: CDISC Alignment Universal Connector is the MECHANISM for achieving CDISC alignment
C4: Interoperability Universal Connector ENABLES standards interoperability
D3: Progressive Refinement Mappings can be added progressively (ADaM first, then SDTM, etc.)

Priority: CRITICAL - Foundational to achieving C1 (CDISC Alignment) and C4 (Interoperability)

Applicability: Universal - applies to all model elements


Category B: Data & Analysis Integrity Principles

These principles ensure reproducibility, quality, and trustworthiness of analyses.


B1: Analysis Reproducibility and Provenance

Source: Model_PRINCIPLES.md (GP-2)

Statement: The AC/DC model SHALL ensure reproducible analyses with clear metadata lineage through immutable entities and directed acyclic dependencies.

Immutability Rules:

  1. Cubes are immutable - operations produce new cubes
  2. Slices are immutable - views don't modify source
  3. Methods produce new outputs - never modify inputs
  4. No cube SHALL appear as both input and output of same method

DAG * Structure Rules:

  1. Source cubes are roots (no dependencies)
  2. Derived cubes depend on upstream cubes
  3. Result cubes are downstream of derived cubes
  4. No cube may depend (directly or transitively) on itself

* Directed Acyclic Graph - graph structure with no cycles

Rationale:

  • Clear data lineage
  • Reproducibility (same inputs → same outputs)
  • Inspectable intermediate results
  • Audit trail integrity
  • Testability

Priority: CRITICAL - Core to scientific integrity

Applicability: Universal - applies to all derivations and analyses


B2: Complete End-to-End Traceability

Statement: Every result SHALL be fully traceable backward through implementation, structure to source concepts.

Traceability Chain:

Display → Analysis Results → Method → Measure → Slice → Cube → Concept

Requirements:

  • All elements declare relationships
  • Orphaned elements are validation errors
  • Traceability extends to protocol objectives

Rationale:

  • Regulatory compliance (21 CFR Part 11, ICH E9)
  • Scientific reproducibility
  • Audit capability
  • Impact analysis
  • Quality assurance

Priority: CRITICAL - Regulatory requirement

Applicability: Universal - applies to all elements


B3: Explicit and Automated Quality Rules

Statement: Analysis checks, validation rules, and constraints SHALL be explicitly specified and automatically verified.

Key Requirements:

  • Rules documented declaratively in model
  • Rules automatically checked during validation
  • Rules versioned with analysis model

Examples:

  • Every analysis must reference at least one estimand
  • Analysis populations defined before use
  • Dependencies must be acyclic
  • Estimands must include all ICH E9(R1) components
  • P-values between 0 and 1
  • Baseline values exist for change-from-baseline analyses

Rationale:

  • Quality by design
  • Automation reduces QC burden
  • Transparency and auditability
  • Consistency across analyses
  • Traceability of violations

Priority: HIGH - Quality assurance

Applicability: Universal - applies to all model specifications


B4: Precision in Specification

Statement: Standardized structure SHALL enforce precision in specifying analysis and derivation settings and assumptions.

Goal: Reduced ambiguity through:

  • Formal structure
  • Explicit declarations
  • Type constraints
  • Validation rules

Rationale:

  • Reduces misinterpretation
  • Improves communication
  • Enables automation
  • Supports regulatory review

Priority: HIGH - Quality of specification

Applicability: Universal - applies to all specifications


B5: Derivation Concept Atomicity

Source: Principles.md (Derivation Concepts section)

Statement: Derivation concepts SHALL "do one thing and do it well" - complex derivations requiring sequences should be broken into multiple concepts.

Requirements:

  • Single responsibility per derivation concept
  • Sequences decomposed into atomic steps
  • Each step traceable independently

Derivation Concept Operations:

  • Update values for existing columns
  • Produce new columns with values
  • Produce new records
  • Combinations of the above

Rationale:

  • Clarity of purpose
  • Reusability
  • Testability
  • Easier debugging

Priority: MEDIUM - Design quality

Applicability: Applies to derivation concept definitions


B6: Modular Derivation Composition

Note

PROPOSED by KWL - Pending review and approval. This principle extends B5 (Atomicity) by addressing how atomic derivations compose into dependency chains.

Statement: Derivation concepts SHALL be composable through explicit dependency declarations, forming directed acyclic graphs (DAGs) that enable reuse across analyses.

Requirements:

  • Derivation Concepts SHALL declare their upstream dependencies (input DCs)
  • Derivation Concepts MAY be reused as inputs to multiple downstream DCs or ACs
  • Execution engines SHALL resolve dependencies and execute in topological order
  • The same DC template SHALL produce consistent results regardless of which downstream consumer uses it
  • Dependency declarations SHALL be explicit and machine-readable

Derivation Dependency Chain Example:

ANCOVA Analysis requires:
├── CHG (ChangeFromBaseline) ← depends on AVAL, BASE
│   ├── AVAL (AnalysisValue) ← depends on SDTM source
│   └── BASE (BaselineValue) ← depends on AVAL, ABLFL
│       └── ABLFL (BaselineFlag) ← depends on visit timing rules
├── PopulationFlags (EFFFL/ITTFL) ← depends on inclusion criteria
└── LOCFImputation ← depends on AVAL, time ordering

Composition Pattern:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    DERIVATION CONCEPT GRAPH (DAG)                           │
│                                                                             │
│   ┌──────────┐     ┌──────────┐                                            │
│   │  SDTM    │     │  Visit   │                                            │
│   │  Source  │     │  Timing  │                                            │
│   └────┬─────┘     └────┬─────┘                                            │
│        │                │                                                   │
│        ▼                ▼                                                   │
│   ┌──────────┐     ┌──────────┐                                            │
│   │   AVAL   │     │  ABLFL   │                                            │
│   │ (DC)     │     │  (DC)    │                                            │
│   └────┬─────┘     └────┬─────┘                                            │
│        │                │                                                   │
│        ├────────────────┤                                                   │
│        │                │                                                   │
│        ▼                ▼                                                   │
│   ┌──────────┐     ┌──────────┐                                            │
│   │  LOCF    │     │   BASE   │                                            │
│   │  (DC)    │     │   (DC)   │                                            │
│   └────┬─────┘     └────┬─────┘                                            │
│        │                │                                                   │
│        │                ├───────────────┐                                   │
│        │                │               │                                   │
│        │                ▼               ▼                                   │
│        │           ┌──────────┐    ┌──────────┐                            │
│        │           │   CHG    │    │  ANCOVA  │                            │
│        │           │   (DC)   │    │   (AC)   │                            │
│        │           └────┬─────┘    └──────────┘                            │
│        │                │               ▲                                   │
│        └────────────────┴───────────────┘                                   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Benefits:

  • Reusability: Define AVAL once, use in BASE, CHG, LOCF, and multiple analyses
  • Consistency: Same derivation logic applied everywhere
  • Parallelization: Independent branches can execute concurrently
  • Provenance: Clear lineage from final analysis to source data
  • Testability: Each DC can be validated independently
  • Maintainability: Change in one DC propagates correctly to all dependents

Relationship to Other Principles:

Principle Relationship
B1: Reproducibility Composition through DAG ensures reproducible execution order
B2: Traceability Dependency declarations provide explicit lineage
B5: Atomicity B6 explains how atomic DCs (B5) combine into complete derivations
A4: Analysis/Derivation Separation DCs compose to feed ACs at the boundary

Rationale:

  • Complex clinical derivations require multiple steps (SDTM → AVAL → BASE → CHG)
  • Without explicit composition, derivation logic is duplicated or inconsistent
  • DAG structure enables automated dependency resolution and execution planning
  • Reuse across analyses reduces errors and maintenance burden

Priority: HIGH - Enables practical implementation of atomic derivations (B5)

Applicability: Applies to all derivation concept definitions and their relationships


Category C: Standards Compliance Principles

These principles ensure alignment with industry and regulatory standards.


C1: CDISC Standards Alignment

Statement: The AC/DC model SHALL align with CDISC standards (USDM, SDTM, ADaM, ARS) while providing higher-level abstraction.

CDISC Standards Integration:

  1. USDM (Unified Study Definitions Model)

    • Protocol entities: study design, objectives, endpoints, estimands, populations
    • Links through concepts
    • Protocol specifications inform analysis planning
    • Example: USDM StudyEndpoint → AC/DC AnalysisConcept
  2. SDTM (Study Data Tabulation Model)

    • Source data collection structure
    • Links through structure entities (source cubes) via concepts
    • Example: SDTM LB domain → AC/DC source cube
  3. ADaM (Analysis Data Model)

    • Analysis-ready dataset structure
    • Links through structure entities via concepts
    • Example: AC/DC Cube → ADaM BDS dataset, AC/DC Measure → ADaM AVAL
  4. ARS (Analysis Results Standard)

    • Analysis results metadata
    • Links through derivation entities
    • Example: AC/DC Method → ARS Analysis/AnalysisMethod
    • Example: AC/DC Display → ARS OutputDisplay/Output

Requirements:

  • Source structures traceable to SDTM domains
  • Analysis structures mappable to ADaM
  • Concepts align with USDM protocol definitions
  • Displays mappable to ARS OutputDisplay

Rationale:

  • Industry interoperability
  • Regulatory acceptance
  • Ecosystem integration
  • Standards-based tooling

Priority: CRITICAL - Industry requirement

Applicability: Universal - applies to all model elements


C2: Regulatory Framework Compliance

Statement: The model SHALL support compliance with GxP, ICH guidelines, and regulatory requirements.

ICH E9(R1) Estimand Framework:

  • Treatment variable
  • Population
  • Variable (endpoint)
  • Population-level summary (intercurrent events + handling)
  • Rationale/interpretation

Additional Frameworks:

  • GxP Principles (Good Clinical, Laboratory, Manufacturing Practices)
  • ICH E3 (Clinical Study Reports)
  • ICH E9 (Statistical Principles)
  • 21 CFR Part 11 (Electronic Records/Signatures)
  • ALCOA+ Data Integrity principles

Requirements:

  • Concepts support Estimand entities
  • Analysis specifications reference estimands explicitly
  • Validation verifies estimand completeness
  • Display metadata traces to protocol objectives

Rationale:

  • Health authority acceptance (FDA, EMA, PMDA)
  • Scientific rigor
  • Auditability
  • Protection from regulatory risk

Priority: CRITICAL - Regulatory requirement

Applicability: Applies to confirmatory trial analyses


C3: Scope Boundary Management

Statement: The model SHALL explicitly define what is in scope vs out of scope.

IN SCOPE:

  • Analysis and derivation concepts
  • Subject-level derivations
  • Analysis-level computations
  • Statistical methods
  • Data structures (cubes, dimensions, measures)
  • Traceability to concepts
  • Quality rules

OUT OF SCOPE:

  • Raw data collection (SDTM covers this)
  • Study design specification (USDM covers this)
  • Implementation details (programming syntax)
  • Infrastructure/platform specifics
  • Execution scheduling
  • User interface design

Rationale:

  • Clear boundaries prevent scope creep
  • Leverage existing standards where appropriate
  • Focus on model's unique value

Priority: HIGH - Prevents confusion

Applicability: Universal - defines model boundaries


C4: Standards Interoperability

Statement: Standardized structure SHALL allow interchange of specifications between systems and organizations.

Requirements:

  • Machine-readable format
  • Standard vocabulary
  • Common structure
  • Version control

Benefits:

  • Cross-organizational sharing
  • Tool interoperability
  • Reduced vendor lock-in
  • Community knowledge sharing

Rationale:

  • Industry efficiency
  • Best practice dissemination
  • Regulatory submissions

Priority: HIGH - Industry collaboration

Applicability: Applies to specification format and exchange


Category D: Design Philosophy Principles

These principles reflect the guiding values and approaches for the model.


D1: Declarative Analysis Specification

Statement: Analyses and derivations SHALL be specified declaratively ("what" not "how"), with implementations providing the "how."

Key Characteristics:

  • Specify intent, not procedure
  • Separate specification from implementation
  • Enable multiple implementations of same specification
  • Support validation independent of execution

Benefits:

  • Platform independence
  • Implementation flexibility
  • Easier verification
  • Clear intent

Rationale:

  • Conceptual clarity
  • Implementation alternatives
  • Testing and validation
  • Documentation quality

Priority: HIGH - Core design philosophy

Applicability: Applies to all specifications


D2: Keep It Simple, Stupid (KISS)

Statement: Whenever possible, simplicity is a design goal, though some complexity is unavoidable.

Approach:

  • Minimize unnecessary complexity
  • Use clear, simple structures when possible
  • Hide unavoidable complexity from end-users via tools
  • Prefer straightforward solutions

Rationale:

  • Easier adoption
  • Reduced errors
  • Better maintainability
  • Lower training burden

Priority: FOUNDATIONAL - Guiding principle

Applicability: Universal - applies to all design decisions


D3: Extensible Core with Progressive Refinement

Statement: The model SHALL provide minimal core elements with extension mechanisms for specialized needs.

Core Philosophy:

  • Minimal viable core
  • Extension points for specialization
  • Progressive disclosure of complexity
  • Backward compatibility

Example Extensions:

  • Study-specific concepts
  • Organization-specific methods
  • Therapeutic-area extensions
  • Custom quality rules

Rationale:

  • Broad applicability
  • Specialization support
  • Evolution without breaking changes
  • Community contribution

Priority: MEDIUM - Enables growth

Applicability: Applies to model evolution


D4: Common Language for Collaboration

Statement: The model SHALL provide common language between statisticians, clinicians, data managers, programmers, and stakeholders.

Stakeholder Benefits:

  • Statisticians: Clear analysis specification
  • Clinicians: Understanding of endpoints
  • Data Managers: Data structure clarity
  • Programmers: Unambiguous requirements
  • Regulators: Transparent documentation

Rationale:

  • Reduced miscommunication
  • Improved collaboration
  • Better quality
  • Faster development

Priority: HIGH - Team effectiveness

Applicability: Universal - affects all usage


Category E: Operational Principles

These principles guide implementation, tooling, and usage.


E1: LinkML as Modeling Language

Statement: LinkML SHALL be used as the modeling language instead of UML.

Rationale for LinkML:

  • Less complex than UML
  • Free, open tooling
  • Available for all operating systems
  • Easier visualization
  • No dependency on commercial software

Comparison with USDM:

  • USDM uses UML
  • UML requires Enterprise Architect (Windows-only, commercial)
  • UML complex and inconsistent visualization

Priority: MEDIUM - Tool choice

Applicability: Applies to model representation


E2: Machine Readability and Automation

Statement: Specifications SHALL be machine-readable to enable automated validation and code generation.

Capabilities Enabled:

  • Direct link to programming code
  • Automated validation
  • Reduced transcription errors
  • Code reuse
  • Quality checks

Rationale:

  • Error reduction
  • Efficiency gains
  • Consistency
  • Faster development

Priority: HIGH - Automation benefit

Applicability: Applies to all specifications


E3: Clear Linkage Through Model Layers

Statement: The model SHALL provide clear linkage from results to objectives/endpoints in USDM and vice versa.

Traceability Path:

USDM Objective → Endpoint → Estimand →
Analysis Concept → Method → Display → Results

Benefits:

  • Regulatory clarity
  • Protocol alignment verification
  • Change impact analysis
  • Audit trail

Priority: HIGH - Regulatory necessity

Applicability: Universal - affects all elements


E4: Tool-Hidden Complexity

Statement: Unavoidable complexity SHALL be hidden from end-users by tools.

Approach:

  • User-friendly interfaces
  • Reasonable defaults
  • Progressive disclosure
  • Expert mode for advanced users

Rationale:

  • Accessibility
  • Reduced learning curve
  • Error prevention
  • User satisfaction

Priority: MEDIUM - User experience

Applicability: Applies to tooling and interfaces


Caution

Anything below this message is not yet reviewed.

Open Questions

1. Analysis Concept Definition

Important

Question Can we define Analysis Concept?
Suggested answer A specification of a single analysis computation producing aggregated results from input data

2. Analysis-Only Slices

Important

Question Should model allow slices with no method that are only used by Analysis Concepts?
Suggested answer Yes, allow - not all slices need derivations, some are just selection criteria for analysis inputs

3. Main Goals

Important

Question What are the main goals of the AC/DC model?
Suggested answer

  • Enable reproducible clinical trial analyses
  • Provide standard interchange format
  • Support regulatory compliance
  • Facilitate cross-organizational collaboration