OHDSI tool execution for OMOP CDM data on BigQuery. Deployed as Cloud Run service + jobs.
Executes OHDSI data quality and characterization tools on BigQuery OMOP datasets:
- DQD (DataQualityDashboard) - Data quality validation with 4,000+ checks
- Achilles - Database characterization and descriptive statistics for ATLAS
- PASS (Profile of Analytic Suitability Score) - Evaluates data fitness-for-purpose across six dimensions (accessibility, provenance, standards, concept diversity, source diversity, temporal)
- Cloud Run Service - REST API for health checks, Atlas table creation, and report generation
- Cloud Run Jobs - Long-running tool executions (up to 24h):
ccc-omop-analyzer-dqd-jobccc-omop-analyzer-achilles-jobccc-omop-analyzer-pass-job
- Airflow Integration - Jobs triggered via
CloudRunExecuteJobOperator
Project IDs with hyphens require special handling:
- Achilles: Pass dataset name only (e.g.,
"ehr_synthea"). JDBC connection'sDefaultDatasetparameter handles qualification. - DQD: Use double-quoted fully qualified names (e.g.,
'"project-id".dataset'). SqlRender translates to backticks. - PASS: Uses
project.datasetformat directly (e.g.,"project-id.ehr_synthea").
The PASS job evaluates data quality across six evidence-based dimensions:
- Accessibility - Whether clinical data is present and accessible for analysis
- Provenance - Information preservation and traceability through mapping
- Standards - Use of standardized vocabularies for interoperable research
- Concept Diversity - Variety of distinct clinical concepts in the data
- Source Diversity - Variety of data source types (EHR, claims, registries)
- Temporal - Data distribution over time (span, density, consistency)
Default Settings (configured in constants.R):
METRICS = "all"- All six metrics are calculatedCALCULATE_COMPOSITE = TRUE- Weighted composite score is generatedVERBOSE_MODE = TRUE- Detailed logging enabled
Outputs: Five CSV files uploaded to GCS containing field-level, table-level, and overall scores with 95% confidence intervals.
| File | Purpose |
|---|---|
Dockerfile |
Container image with R, DQD, Achilles, PASS, and dependencies |
cloudbuild.yaml |
Cloud Build config for service + jobs (DQD, Achilles, PASS) |
plumber_api.R |
REST API with health check, Atlas table creation, and report generation |
entrypoint.sh |
Container entrypoint for service account authentication |
| File | Purpose |
|---|---|
constants.R |
Configuration constants (DQD, Achilles, PASS) |
utils.R |
Helper functions (GCS, JDBC, authentication, SQL) |
run_dqd.R |
DQD execution logic |
run_dqd_job.R |
DQD job entrypoint |
run_achilles.R |
Achilles execution logic |
run_achilles_job.R |
Achilles job entrypoint |
run_pass.R |
PASS execution logic |
run_pass_job.R |
PASS job entrypoint |
run_create_atlas_results_tables.R |
Atlas results table creation |
run_generate_delivery_report.R |
OMOP delivery report generation |
gcloud builds submit --config cloudbuild.yamlDeploys:
- Service:
ccc-omop-analyzer(4 CPU, 8GB RAM, 1h timeout) - Jobs:
ccc-omop-analyzer-{dqd,achilles,pass}-job(4 CPU, 8GB RAM, 2h timeout, configurable to 24h)
All jobs require PROJECT_ID, CDM_DATASET_ID, and GCS_ARTIFACT_PATH. Additional variables:
DQD/Achilles jobs:
ANALYTICS_DATASET_ID- Target dataset for resultsCDM_VERSION- OMOP CDM version (e.g., "5.4")CDM_SOURCE_NAME- Source identifierSERVICE_ACCOUNT_EMAIL- BigQuery JDBC auth
PASS job: No additional variables required.
Jobs are triggered from Airflow via CloudRunExecuteJobOperator. Each job:
- Authenticates with BigQuery/GCS
- Executes tool logic (
run_dqd(),run_achilles(), orrun_pass()) - Uploads results to GCS and/or BigQuery
- Exits with status code (0=success, 1=failure)
| Tool | Artifacts |
|---|---|
| DQD | GCS: dqdashboard_results.{json,csv} BigQuery: {analytics_dataset_id}.dqd_results |
| Achilles | GCS: achilles_results.csv, results/*.json (ARES) BigQuery: {dataset_id}.achilles_*, {dataset_id}.*_concept_counts |
| PASS | GCS: pass_overall.csv (scores by metric with CIs), pass_table_level.csv (scores by table), pass_field_level.csv (field-level detail), pass_composite_overall.csv (composite score), pass_composite_components.csv (metric contributions) |
| Endpoint | Method | Parameters | Purpose |
|---|---|---|---|
/heartbeat |
GET | - | Health check |
/create_atlas_results_tables |
POST | project_id, cdm_dataset_id, analytics_dataset_id |
Create Atlas results tables in BigQuery |
/generate_delivery_report |
POST | delivery_report_path, dqd_results_path, output_gcs_path |
Generate HTML report from delivery metrics and DQD results |
Service account credentials from Secret Manager (ccc-omop-analyzer-secret → SERVICE_ACCOUNT_JSON).
Required permissions:
- BigQuery read/write
- GCS write
- Secret Manager read