-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Caution
This is a work in progress!
Principles are classified into 5 categories based on their primary concern:
| Category | Focus | Priority Level |
|---|---|---|
| A: Architectural | Structure, layers, dependencies | CRITICAL |
| B: Data & Analysis Integrity | Reproducibility, traceability, quality | CRITICAL |
| C: Standards Compliance | CDISC, regulatory, ICH | CRITICAL |
| D: Design Philosophy | Guiding values and approaches | FOUNDATIONAL |
| E: Operational | Tools, implementation, usage | SUPPORTING |
These principles define the fundamental structure and organization of the AC/DC model.
Statement: The AC/DC model SHALL maintain strict separation between layers with unidirectional dependency flow.
Three-Layer Structure:
- Concepts: Abstract definitions of biomedical, derivation, and analysis concepts
- Structures: Concrete data organization
- Implementation: Transformations and presentations
Dependency Flow:
Implementations → Structures → Concepts
(depend_on) (depend_on)
Rules:
- Concepts SHALL NOT reference structures or implementations
- Structures SHALL ONLY reference concepts, NOT implementations
- Implementations MAY reference both structures and concepts
- Elements must belong to exactly one layer
Rationale:
- Clear separation of concerns
- Independent verification of each layer
- Reusability across analyses
- No circular dependencies
- Maintainability
Priority: CRITICAL - This is the foundational architectural principle
Applicability: Universal - applies to all model elements
Statement: Concepts SHALL be defined independent of specific studies, data standards, and implementation details.
Requirements:
- Concepts describe real-world entities abstractly
- Not context-dependent
- Human-readable
- Independent of SDTM, ADaM, or other standards
Contrast with Structures and Implementation:
- Structures and Implementation are context-dependent
- Machine-readable and executable
- Instantiated in specific data standards
Rationale:
- Reusability across studies
- Conceptual clarity
- Standards-agnostic reasoning
Priority: CRITICAL - Core to the conceptual/implementation separation
Applicability: Applies to all concept definitions
Statement: Every implementation SHALL link to concrete data standards (SDTM, ADaM) through structures and be machine-executable.
Requirements:
- Implementations are representations of concepts in code
- Must be machine-readable
- Require concrete mappings
- Must be executable by computational systems
Rationale:
- Automation capability
- Reduced transcription errors
- Validation automation
- Reuse of code
Priority: CRITICAL - Enables automation
Applicability: Applies to all implementation elements
Statement: The model SHALL distinguish between subject-level derivations and non-subject-level analyses.
Definitions:
- Derivation: Subject-level data handling to generate new subject-level data
- Analysis: Creation of non-subject-level aggregated data
Rationale:
- Different purposes require different approaches
- Subject-level vs aggregate operations
- Traceability requirements differ
Priority: HIGH - Important for scope clarity
Applicability: Applies to all methods and operations
Statement: Data SHALL be organized as multi-dimensional cubes with dimensions, measures, and attributes.
Structure:
- Dimensions: Identify and organize observations (subject, treatment, visit)
- Measures: Quantitative or qualitative values being analyzed
- Attributes: Qualifying metadata
Rationale:
- Natural representation of clinical trial data
- Supports OLAP-style operations
- Aligns with ADaM BDS structure
- Enables flexible slicing and aggregation
Priority: HIGH - Core structural pattern
Applicability: Applies to all data structures
Statement: Data subsets SHALL be defined declaratively as slices by fixing dimension values or applying filters.
Requirements:
- Slices are immutable views
- Created by fixing one or more dimension values
- Can represent subsets of records, variables, or both
- May exist without methods (for analysis input)
Rationale:
- Declarative specification
- Reusability
- Clear provenance
- Separation of data selection from computation
Priority: HIGH - Fundamental operation
Applicability: Applies to all data subsetting
Statement: All methods SHALL explicitly declare inputs, outputs, and arguments.
Requirements:
- Methods transform inputs to outputs
- Arguments parameterize behavior
- Inputs must be declared
- Outputs must be declared
- Arguments must be specified
Rationale:
- Explicit interface definition
- Type checking
- Dependency analysis
- Reproducibility
Priority: HIGH - Enables automated analysis
Applicability: Applies to all methods (derivation and analysis)
Note
PROPOSED by KWL - Pending review and approval.
See: model/semantic-define-json-approach/WIKI_PROPOSAL_UNIVERSAL_CONNECTOR.md
Statement: DataConcepts SHALL serve as the universal abstraction layer connecting analytical structures to external domain models.
Requirements:
- Cube elements (Dimensions, Measures, Attributes) SHALL reference DataConcepts via
is_arelationships, NOT domain-specific variables directly - DataConcepts SHALL support simultaneous mappings to multiple external models (ADaM, SDTM, USDM, ARS, or proprietary)
- Adding support for a new domain model SHALL NOT require changes to existing cube structures or templates
- Templates SHALL be expressed in terms of DataConcepts, enabling portability across organizations
- Execution engines SHALL resolve DataConcept references to domain-specific variables through explicit mapping declarations
- The same DataConcept MAY map to different representations in different domain models
Architecture Diagram:
┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐
│ ADaM │ │ SDTM │ │ USDM │ │ ARS │
│ USUBJID │ │ SUBJID │ │ Subject │ │ Result │
│ CHG │ │ │ │ │ │ │
└───────────┘ └───────────┘ └───────────┘ └───────────┘
EXTERNAL DOMAIN MODELS (Pluggable)
▲ ▲ ▲ ▲
│ │ │ │
│ maps_to maps_to maps_to
│ │ │ │
┌─────────────────────────────────────────────────────────────────┐
│ DATA CONCEPTS │
│ (Universal Abstraction Layer) │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ DataConcept: "subject" │ │
│ │ description: "Study participant" │ │
│ │ mappings: │ │
│ │ adam_variable: USUBJID │ │
│ │ sdtm_variable: SUBJID │ │
│ │ usdm_element: StudySubject │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
▲ ▲ ▲
│ │ │
│ ┌────┴────┐ ┌────┴────┐
│ │ is_a │ │ is_a │
│ └─────────┘ └─────────┘
│ │ │
┌───────┼───────────────┼────────────────┼────────────────────────┐
│ │ │ │ │
│ ┌────┴────┐ ┌─────┴─────┐ ┌─────┴─────┐ │
│ │ Cube │ │ Dimension │ │ Measure │ │
│ └─────────┘ └───────────┘ └───────────┘ │
│ ANALYTICAL STRUCTURES │
└─────────────────────────────────────────────────────────────────┘
Rationale:
- Provides the mechanism (HOW) for achieving C1 and C4
- Enables portable templates
- Supports multi-standard compliance
- Future-proofs against new standards
- Decouples analytical logic from data standard specifics
Relationship to Other Principles:
| Principle | Relationship |
|---|---|
| A1: Layered Architecture | DataConcepts ARE the Concepts layer; this principle specifies their connector role |
| A2: Concept Independence | This principle explains HOW concepts remain independent (via abstraction layer) |
| C1: CDISC Alignment | Universal Connector is the MECHANISM for achieving CDISC alignment |
| C4: Interoperability | Universal Connector ENABLES standards interoperability |
| D3: Progressive Refinement | Mappings can be added progressively (ADaM first, then SDTM, etc.) |
Priority: CRITICAL - Foundational to achieving C1 (CDISC Alignment) and C4 (Interoperability)
Applicability: Universal - applies to all model elements
These principles ensure reproducibility, quality, and trustworthiness of analyses.
Source: Model_PRINCIPLES.md (GP-2)
Statement: The AC/DC model SHALL ensure reproducible analyses with clear metadata lineage through immutable entities and directed acyclic dependencies.
Immutability Rules:
- Cubes are immutable - operations produce new cubes
- Slices are immutable - views don't modify source
- Methods produce new outputs - never modify inputs
- No cube SHALL appear as both input and output of same method
DAG * Structure Rules:
- Source cubes are roots (no dependencies)
- Derived cubes depend on upstream cubes
- Result cubes are downstream of derived cubes
- No cube may depend (directly or transitively) on itself
* Directed Acyclic Graph - graph structure with no cycles
Rationale:
- Clear data lineage
- Reproducibility (same inputs → same outputs)
- Inspectable intermediate results
- Audit trail integrity
- Testability
Priority: CRITICAL - Core to scientific integrity
Applicability: Universal - applies to all derivations and analyses
Statement: Every result SHALL be fully traceable backward through implementation, structure to source concepts.
Traceability Chain:
Display → Analysis Results → Method → Measure → Slice → Cube → Concept
Requirements:
- All elements declare relationships
- Orphaned elements are validation errors
- Traceability extends to protocol objectives
Rationale:
- Regulatory compliance (21 CFR Part 11, ICH E9)
- Scientific reproducibility
- Audit capability
- Impact analysis
- Quality assurance
Priority: CRITICAL - Regulatory requirement
Applicability: Universal - applies to all elements
Statement: Analysis checks, validation rules, and constraints SHALL be explicitly specified and automatically verified.
Key Requirements:
- Rules documented declaratively in model
- Rules automatically checked during validation
- Rules versioned with analysis model
Examples:
- Every analysis must reference at least one estimand
- Analysis populations defined before use
- Dependencies must be acyclic
- Estimands must include all ICH E9(R1) components
- P-values between 0 and 1
- Baseline values exist for change-from-baseline analyses
Rationale:
- Quality by design
- Automation reduces QC burden
- Transparency and auditability
- Consistency across analyses
- Traceability of violations
Priority: HIGH - Quality assurance
Applicability: Universal - applies to all model specifications
Statement: Standardized structure SHALL enforce precision in specifying analysis and derivation settings and assumptions.
Goal: Reduced ambiguity through:
- Formal structure
- Explicit declarations
- Type constraints
- Validation rules
Rationale:
- Reduces misinterpretation
- Improves communication
- Enables automation
- Supports regulatory review
Priority: HIGH - Quality of specification
Applicability: Universal - applies to all specifications
Source: Principles.md (Derivation Concepts section)
Statement: Derivation concepts SHALL "do one thing and do it well" - complex derivations requiring sequences should be broken into multiple concepts.
Requirements:
- Single responsibility per derivation concept
- Sequences decomposed into atomic steps
- Each step traceable independently
Derivation Concept Operations:
- Update values for existing columns
- Produce new columns with values
- Produce new records
- Combinations of the above
Rationale:
- Clarity of purpose
- Reusability
- Testability
- Easier debugging
Priority: MEDIUM - Design quality
Applicability: Applies to derivation concept definitions
Note
PROPOSED by KWL - Pending review and approval. This principle extends B5 (Atomicity) by addressing how atomic derivations compose into dependency chains.
Statement: Derivation concepts SHALL be composable through explicit dependency declarations, forming directed acyclic graphs (DAGs) that enable reuse across analyses.
Requirements:
- Derivation Concepts SHALL declare their upstream dependencies (input DCs)
- Derivation Concepts MAY be reused as inputs to multiple downstream DCs or ACs
- Execution engines SHALL resolve dependencies and execute in topological order
- The same DC template SHALL produce consistent results regardless of which downstream consumer uses it
- Dependency declarations SHALL be explicit and machine-readable
Derivation Dependency Chain Example:
ANCOVA Analysis requires:
├── CHG (ChangeFromBaseline) ← depends on AVAL, BASE
│ ├── AVAL (AnalysisValue) ← depends on SDTM source
│ └── BASE (BaselineValue) ← depends on AVAL, ABLFL
│ └── ABLFL (BaselineFlag) ← depends on visit timing rules
├── PopulationFlags (EFFFL/ITTFL) ← depends on inclusion criteria
└── LOCFImputation ← depends on AVAL, time ordering
Composition Pattern:
┌─────────────────────────────────────────────────────────────────────────────┐
│ DERIVATION CONCEPT GRAPH (DAG) │
│ │
│ ┌──────────┐ ┌──────────┐ │
│ │ SDTM │ │ Visit │ │
│ │ Source │ │ Timing │ │
│ └────┬─────┘ └────┬─────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ │
│ │ AVAL │ │ ABLFL │ │
│ │ (DC) │ │ (DC) │ │
│ └────┬─────┘ └────┬─────┘ │
│ │ │ │
│ ├────────────────┤ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ │
│ │ LOCF │ │ BASE │ │
│ │ (DC) │ │ (DC) │ │
│ └────┬─────┘ └────┬─────┘ │
│ │ │ │
│ │ ├───────────────┐ │
│ │ │ │ │
│ │ ▼ ▼ │
│ │ ┌──────────┐ ┌──────────┐ │
│ │ │ CHG │ │ ANCOVA │ │
│ │ │ (DC) │ │ (AC) │ │
│ │ └────┬─────┘ └──────────┘ │
│ │ │ ▲ │
│ └────────────────┴───────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Benefits:
- Reusability: Define AVAL once, use in BASE, CHG, LOCF, and multiple analyses
- Consistency: Same derivation logic applied everywhere
- Parallelization: Independent branches can execute concurrently
- Provenance: Clear lineage from final analysis to source data
- Testability: Each DC can be validated independently
- Maintainability: Change in one DC propagates correctly to all dependents
Relationship to Other Principles:
| Principle | Relationship |
|---|---|
| B1: Reproducibility | Composition through DAG ensures reproducible execution order |
| B2: Traceability | Dependency declarations provide explicit lineage |
| B5: Atomicity | B6 explains how atomic DCs (B5) combine into complete derivations |
| A4: Analysis/Derivation Separation | DCs compose to feed ACs at the boundary |
Rationale:
- Complex clinical derivations require multiple steps (SDTM → AVAL → BASE → CHG)
- Without explicit composition, derivation logic is duplicated or inconsistent
- DAG structure enables automated dependency resolution and execution planning
- Reuse across analyses reduces errors and maintenance burden
Priority: HIGH - Enables practical implementation of atomic derivations (B5)
Applicability: Applies to all derivation concept definitions and their relationships
These principles ensure alignment with industry and regulatory standards.
Statement: The AC/DC model SHALL align with CDISC standards (USDM, SDTM, ADaM, ARS) while providing higher-level abstraction.
CDISC Standards Integration:
-
USDM (Unified Study Definitions Model)
- Protocol entities: study design, objectives, endpoints, estimands, populations
- Links through concepts
- Protocol specifications inform analysis planning
- Example: USDM StudyEndpoint → AC/DC AnalysisConcept
-
SDTM (Study Data Tabulation Model)
- Source data collection structure
- Links through structure entities (source cubes) via concepts
- Example: SDTM LB domain → AC/DC source cube
-
ADaM (Analysis Data Model)
- Analysis-ready dataset structure
- Links through structure entities via concepts
- Example: AC/DC Cube → ADaM BDS dataset, AC/DC Measure → ADaM AVAL
-
ARS (Analysis Results Standard)
- Analysis results metadata
- Links through derivation entities
- Example: AC/DC Method → ARS Analysis/AnalysisMethod
- Example: AC/DC Display → ARS OutputDisplay/Output
Requirements:
- Source structures traceable to SDTM domains
- Analysis structures mappable to ADaM
- Concepts align with USDM protocol definitions
- Displays mappable to ARS OutputDisplay
Rationale:
- Industry interoperability
- Regulatory acceptance
- Ecosystem integration
- Standards-based tooling
Priority: CRITICAL - Industry requirement
Applicability: Universal - applies to all model elements
Statement: The model SHALL support compliance with GxP, ICH guidelines, and regulatory requirements.
ICH E9(R1) Estimand Framework:
- Treatment variable
- Population
- Variable (endpoint)
- Population-level summary (intercurrent events + handling)
- Rationale/interpretation
Additional Frameworks:
- GxP Principles (Good Clinical, Laboratory, Manufacturing Practices)
- ICH E3 (Clinical Study Reports)
- ICH E9 (Statistical Principles)
- 21 CFR Part 11 (Electronic Records/Signatures)
- ALCOA+ Data Integrity principles
Requirements:
- Concepts support Estimand entities
- Analysis specifications reference estimands explicitly
- Validation verifies estimand completeness
- Display metadata traces to protocol objectives
Rationale:
- Health authority acceptance (FDA, EMA, PMDA)
- Scientific rigor
- Auditability
- Protection from regulatory risk
Priority: CRITICAL - Regulatory requirement
Applicability: Applies to confirmatory trial analyses
Statement: The model SHALL explicitly define what is in scope vs out of scope.
IN SCOPE:
- Analysis and derivation concepts
- Subject-level derivations
- Analysis-level computations
- Statistical methods
- Data structures (cubes, dimensions, measures)
- Traceability to concepts
- Quality rules
OUT OF SCOPE:
- Raw data collection (SDTM covers this)
- Study design specification (USDM covers this)
- Implementation details (programming syntax)
- Infrastructure/platform specifics
- Execution scheduling
- User interface design
Rationale:
- Clear boundaries prevent scope creep
- Leverage existing standards where appropriate
- Focus on model's unique value
Priority: HIGH - Prevents confusion
Applicability: Universal - defines model boundaries
Statement: Standardized structure SHALL allow interchange of specifications between systems and organizations.
Requirements:
- Machine-readable format
- Standard vocabulary
- Common structure
- Version control
Benefits:
- Cross-organizational sharing
- Tool interoperability
- Reduced vendor lock-in
- Community knowledge sharing
Rationale:
- Industry efficiency
- Best practice dissemination
- Regulatory submissions
Priority: HIGH - Industry collaboration
Applicability: Applies to specification format and exchange
These principles reflect the guiding values and approaches for the model.
Statement: Analyses and derivations SHALL be specified declaratively ("what" not "how"), with implementations providing the "how."
Key Characteristics:
- Specify intent, not procedure
- Separate specification from implementation
- Enable multiple implementations of same specification
- Support validation independent of execution
Benefits:
- Platform independence
- Implementation flexibility
- Easier verification
- Clear intent
Rationale:
- Conceptual clarity
- Implementation alternatives
- Testing and validation
- Documentation quality
Priority: HIGH - Core design philosophy
Applicability: Applies to all specifications
Statement: Whenever possible, simplicity is a design goal, though some complexity is unavoidable.
Approach:
- Minimize unnecessary complexity
- Use clear, simple structures when possible
- Hide unavoidable complexity from end-users via tools
- Prefer straightforward solutions
Rationale:
- Easier adoption
- Reduced errors
- Better maintainability
- Lower training burden
Priority: FOUNDATIONAL - Guiding principle
Applicability: Universal - applies to all design decisions
Statement: The model SHALL provide minimal core elements with extension mechanisms for specialized needs.
Core Philosophy:
- Minimal viable core
- Extension points for specialization
- Progressive disclosure of complexity
- Backward compatibility
Example Extensions:
- Study-specific concepts
- Organization-specific methods
- Therapeutic-area extensions
- Custom quality rules
Rationale:
- Broad applicability
- Specialization support
- Evolution without breaking changes
- Community contribution
Priority: MEDIUM - Enables growth
Applicability: Applies to model evolution
Statement: The model SHALL provide common language between statisticians, clinicians, data managers, programmers, and stakeholders.
Stakeholder Benefits:
- Statisticians: Clear analysis specification
- Clinicians: Understanding of endpoints
- Data Managers: Data structure clarity
- Programmers: Unambiguous requirements
- Regulators: Transparent documentation
Rationale:
- Reduced miscommunication
- Improved collaboration
- Better quality
- Faster development
Priority: HIGH - Team effectiveness
Applicability: Universal - affects all usage
These principles guide implementation, tooling, and usage.
Statement: LinkML SHALL be used as the modeling language instead of UML.
Rationale for LinkML:
- Less complex than UML
- Free, open tooling
- Available for all operating systems
- Easier visualization
- No dependency on commercial software
Comparison with USDM:
- USDM uses UML
- UML requires Enterprise Architect (Windows-only, commercial)
- UML complex and inconsistent visualization
Priority: MEDIUM - Tool choice
Applicability: Applies to model representation
Statement: Specifications SHALL be machine-readable to enable automated validation and code generation.
Capabilities Enabled:
- Direct link to programming code
- Automated validation
- Reduced transcription errors
- Code reuse
- Quality checks
Rationale:
- Error reduction
- Efficiency gains
- Consistency
- Faster development
Priority: HIGH - Automation benefit
Applicability: Applies to all specifications
Statement: The model SHALL provide clear linkage from results to objectives/endpoints in USDM and vice versa.
Traceability Path:
USDM Objective → Endpoint → Estimand →
Analysis Concept → Method → Display → Results
Benefits:
- Regulatory clarity
- Protocol alignment verification
- Change impact analysis
- Audit trail
Priority: HIGH - Regulatory necessity
Applicability: Universal - affects all elements
Statement: Unavoidable complexity SHALL be hidden from end-users by tools.
Approach:
- User-friendly interfaces
- Reasonable defaults
- Progressive disclosure
- Expert mode for advanced users
Rationale:
- Accessibility
- Reduced learning curve
- Error prevention
- User satisfaction
Priority: MEDIUM - User experience
Applicability: Applies to tooling and interfaces
Caution
Anything below this message is not yet reviewed.
Important
Question Can we define Analysis Concept?
Suggested answer A specification of a single analysis computation producing aggregated results from input data
Important
Question Should model allow slices with no method that are only used by Analysis Concepts?
Suggested answer Yes, allow - not all slices need derivations, some are just selection criteria for analysis inputs
Important
Question What are the main goals of the AC/DC model?
Suggested answer
- Enable reproducible clinical trial analyses
- Provide standard interchange format
- Support regulatory compliance
- Facilitate cross-organizational collaboration