Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
79109d3
chore(gitignore): add wrangler
cameronraysmith Feb 18, 2026
3957269
docs(omicsio): migrate schema and downstream architecture reference
cameronraysmith Feb 18, 2026
08fdcb1
docs(omicsio): migrate upstream processing pipeline reference
cameronraysmith Feb 18, 2026
b9b7c0b
docs(data): migrate GRN modeling dataset catalog
cameronraysmith Feb 18, 2026
d8c3ecd
docs(data): migrate GRN modeling dataset download URLs
cameronraysmith Feb 18, 2026
4b0b578
docs(scilake): migrate ingestion ADR from omicslake
cameronraysmith Feb 18, 2026
771c407
docs(scilake): migrate metadata store ADR from omicslake
cameronraysmith Feb 18, 2026
edfce93
docs(scilake): migrate orchestration ADR from omicslake
cameronraysmith Feb 18, 2026
26b57b3
docs(scilake): migrate and merge architecture overview from omicslake
cameronraysmith Feb 18, 2026
a00ba60
feat: create scilake package skeleton
cameronraysmith Feb 18, 2026
2966f3b
chore(scilake): generate uv.lock
cameronraysmith Feb 18, 2026
4ea408d
chore(beads): create bootstrap, omicsio specification, and scilake to…
cameronraysmith Feb 18, 2026
7a92bff
chore(beads): sync
cameronraysmith Feb 18, 2026
62380f1
chore(beads): sync
cameronraysmith Feb 18, 2026
6ff7d2c
docs(omicsio): draft columnar encoding research framing document
cameronraysmith Feb 18, 2026
9783b8c
chore(scilake): v0.0.1 -> v0.0.0
cameronraysmith Feb 18, 2026
145b245
fix(scilake): add py.typed marker for PEP 561 compliance
cameronraysmith Feb 18, 2026
b9108e4
fix(scilake): add package.json for semantic-release configuration
cameronraysmith Feb 19, 2026
6d6a201
fix(scilake): add CI configuration
cameronraysmith Feb 19, 2026
1d72311
chore(scilake): sync pixi
cameronraysmith Feb 19, 2026
9c7eebd
chore(scilake): v0.0.0 -> v0.0.1
cameronraysmith Feb 18, 2026
fc351da
fix(docs): exclude notes directory from quarto site rendering
cameronraysmith Feb 19, 2026
feed1b8
fix(dvc): update remote URL from omicsio to data project
cameronraysmith Feb 19, 2026
3864550
fix(dvc): reset freeze directory to empty state
cameronraysmith Feb 19, 2026
c51a0ca
fix(nix): rename devshell from omicsio to data
cameronraysmith Feb 19, 2026
375e00b
fix(docs): add positive render pattern for quarto site build
cameronraysmith Feb 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .beads/issues.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{"id":"data-a0y","title":"omicsio specification","status":"open","priority":2,"issue_type":"epic","owner":"cameron.ray.smith@gmail.com","created_at":"2026-02-18T17:22:11.753993-05:00","created_by":"Cameron Smith","updated_at":"2026-02-18T17:22:11.753993-05:00"}
{"id":"data-a0y.1","title":"Research lakehouse-native columnar encoding for multimodal single-cell data","description":"Context: omicsio converts AnnData/MuData to columnar formats (vortex, parquet). The primary design question is how to optimally represent multimodal count/expression matrices in lakehouse table formats.\n\nTwo known approaches to evaluate:\n- X-Atlas-Orion list-column format: ~/projects/omicslake-workspace/X-Atlas-Orion\n- pqdata COO format: ~/projects/omicslake-workspace/pqdata\n\nTwo coupled questions:\n1. Lakehouse-native representation compatible with both parquet and vortex, building on X-Atlas-Orion and extending to multimodal data\n2. Rehydration to AnnData/MuData with Arrow as in-memory intermediate for out-of-core DuckDB processing\n\nUse cases to evaluate against:\n- Transcriptomic data with total RNA\n- RNA separated by spliced/unspliced\n- RNA separated by spliced/unspliced/ambiguous\n- Multimodal with ATAC-seq data (10x multiome format)\n- MuData structures with modalities of different dimensions\n\nDownstream consumers whose access patterns constrain the format:\n- pyrovelocity: ~/projects/pyrovelocity-workspace/pyrovelocity\n- stormi: ~/projects/pyrovelocity-workspace/stormi-review/stormi\n- hodosome: ~/projects/hodosome-workspace/hodosome\n\nReference documents in this repo:\n- docs/notes/omicsio/single-cell-data-architecture-for-queries-and-training.md\n- docs/notes/omicsio/single-cell-processing-pipeline-binseq-to-count-matrices.md\n- docs/notes/omicsio/pqdata-vs-xatlas-orion-schema-comparison.md\n\nContext files:\n- packages/omicsio/CLAUDE.md -\u003e ~/projects/sciexp/planning/contexts/omicsio.md\n\nStarting artifact: docs/notes/omicsio/research/columnar-encoding-for-multimodal-single-cell-data.md\n\nFirst round is reference-based research: survey local repos and docs to map the decision space. Output is a framing document capturing what is known, what remains open, and what experiments would resolve the open questions.","status":"open","priority":2,"issue_type":"task","owner":"cameron.ray.smith@gmail.com","created_at":"2026-02-18T17:22:19.109095-05:00","created_by":"Cameron Smith","updated_at":"2026-02-18T17:22:40.340002-05:00","dependencies":[{"issue_id":"data-a0y.1","depends_on_id":"data-a0y","type":"parent-child","created_at":"2026-02-18T17:22:19.10975-05:00","created_by":"Cameron Smith"}]}
{"id":"data-pvg","title":"Bootstrap sciexp data platform","status":"inreview","priority":2,"issue_type":"epic","owner":"cameron.ray.smith@gmail.com","created_at":"2026-02-18T17:21:12.110237-05:00","created_by":"Cameron Smith","updated_at":"2026-02-18T17:31:25.247825-05:00"}
{"id":"data-pvg.1","title":"Migrate predecessor documentation","status":"closed","priority":2,"issue_type":"task","owner":"cameron.ray.smith@gmail.com","created_at":"2026-02-18T17:21:17.166594-05:00","created_by":"Cameron Smith","updated_at":"2026-02-18T17:21:31.333473-05:00","closed_at":"2026-02-18T17:21:31.333473-05:00","close_reason":"Completed in 8 commits (3957269..26b57b3) on dat-bootstrap branch","dependencies":[{"issue_id":"data-pvg.1","depends_on_id":"data-pvg","type":"parent-child","created_at":"2026-02-18T17:21:17.167257-05:00","created_by":"Cameron Smith"}]}
{"id":"data-pvg.2","title":"Create scilake package skeleton","status":"closed","priority":2,"issue_type":"task","owner":"cameron.ray.smith@gmail.com","created_at":"2026-02-18T17:21:18.432294-05:00","created_by":"Cameron Smith","updated_at":"2026-02-18T17:21:35.03494-05:00","closed_at":"2026-02-18T17:21:35.03494-05:00","close_reason":"Completed in commits a00ba60..2966f3b on dat-bootstrap branch","dependencies":[{"issue_id":"data-pvg.2","depends_on_id":"data-pvg","type":"parent-child","created_at":"2026-02-18T17:21:18.432861-05:00","created_by":"Cameron Smith"}]}
{"id":"data-rg1","title":"scilake tool evaluation","status":"open","priority":2,"issue_type":"epic","owner":"cameron.ray.smith@gmail.com","created_at":"2026-02-18T17:22:43.983405-05:00","created_by":"Cameron Smith","updated_at":"2026-02-18T17:22:43.983405-05:00"}
{"id":"data-rg1.1","title":"Survey tool landscape from local repos and documentation","description":"Reference-based first pass across candidate tools for scilake's orchestration and catalog layers.\n\nTools to evaluate: DuckDB, DuckLake, dlt, SQLMesh, Dagster, Flyte.\n\nKey context:\n- Migrated ADRs: docs/notes/scilake/architecture/decisions/ (ingestion, metadata-store, orchestration)\n- Architecture overview: docs/notes/scilake/architecture/overview.md\n- DuckLake source: ~/projects/lakescope-workspace/ducklake/\n- scilake context: packages/scilake/CLAUDE.md -\u003e ~/projects/sciexp/planning/contexts/scilake.md\n\nNear-term constraint: DuckLake does not yet support vortex. Plan is managed directory trees of vortex/parquet files first, DuckLake adoption when vortex support lands.\n\nThe migrated ADRs favor Dagster (orchestration) and DuckLake (metadata). Treat these as hypotheses to validate, not commitments.\n\nConsider Dagster local dev overhead and production scaling. Flyte is an alternative. SQLMesh may be sufficient alone for some workflows. dlt may complement or overlap with other tools.\n\nOutput: assessment of each tool's capabilities, integration points, and constraints. Identification of which combinations to test experimentally and in what order.","status":"open","priority":2,"issue_type":"task","owner":"cameron.ray.smith@gmail.com","created_at":"2026-02-18T17:22:50.000747-05:00","created_by":"Cameron Smith","updated_at":"2026-02-18T17:23:24.409914-05:00","dependencies":[{"issue_id":"data-rg1.1","depends_on_id":"data-rg1","type":"parent-child","created_at":"2026-02-18T17:22:50.001544-05:00","created_by":"Cameron Smith"}]}
{"id":"data-rg1.2","title":"Implement DuckDB directory-tree catalog management in scilake","description":"Foundation layer for scilake. Implement managed directory trees of vortex/parquet files with DuckDB as query engine.\n\nThis is the base that all other tool integrations build on. DuckLake is deferred until it supports vortex; monitor its roadmap.\n\nImplement directly in packages/scilake/ as production code.\n\nHuggingFace Hub integration: upload files and access via DuckDB httpfs extension using hf:// URIs. HF Hub is git-lfs based and supports any file type.\n\nVortex is increasingly preferred over parquet for performance. Support both formats.","status":"open","priority":2,"issue_type":"task","owner":"cameron.ray.smith@gmail.com","created_at":"2026-02-18T17:22:53.78552-05:00","created_by":"Cameron Smith","updated_at":"2026-02-18T17:23:32.845226-05:00","dependencies":[{"issue_id":"data-rg1.2","depends_on_id":"data-rg1","type":"parent-child","created_at":"2026-02-18T17:22:53.786422-05:00","created_by":"Cameron Smith"},{"issue_id":"data-rg1.2","depends_on_id":"data-rg1.1","type":"blocks","created_at":"2026-02-18T17:23:57.178252-05:00","created_by":"Cameron Smith"}]}
{"id":"data-rg1.3","title":"Evaluate dlt integration for data ingestion","description":"Evaluate whether dlt adds value for extract-load into managed directory trees.\n\nConsider dlt standalone and composed with an orchestrator (Dagster/Flyte).\n\nImplement directly in packages/scilake/ as production code. If dlt doesn't prove valuable, remove it.","status":"open","priority":2,"issue_type":"task","owner":"cameron.ray.smith@gmail.com","created_at":"2026-02-18T17:23:00.451788-05:00","created_by":"Cameron Smith","updated_at":"2026-02-18T17:23:38.85833-05:00","dependencies":[{"issue_id":"data-rg1.3","depends_on_id":"data-rg1","type":"parent-child","created_at":"2026-02-18T17:23:00.452758-05:00","created_by":"Cameron Smith"},{"issue_id":"data-rg1.3","depends_on_id":"data-rg1.1","type":"blocks","created_at":"2026-02-18T17:23:57.29699-05:00","created_by":"Cameron Smith"},{"issue_id":"data-rg1.3","depends_on_id":"data-rg1.2","type":"blocks","created_at":"2026-02-18T17:24:02.231573-05:00","created_by":"Cameron Smith"}]}
{"id":"data-rg1.4","title":"Evaluate SQLMesh for transformation layer","description":"Evaluate SQLMesh for incremental transformations over directory-tree data.\n\nKey questions:\n- How does SQLMesh integrate with DuckDB?\n- Is SQLMesh sufficient as a standalone orchestration solution?\n- Or should it be integrated with Dagster/Flyte?\n\nImplement directly in packages/scilake/ as production code.","status":"open","priority":2,"issue_type":"task","owner":"cameron.ray.smith@gmail.com","created_at":"2026-02-18T17:23:04.88131-05:00","created_by":"Cameron Smith","updated_at":"2026-02-18T17:23:44.51313-05:00","dependencies":[{"issue_id":"data-rg1.4","depends_on_id":"data-rg1","type":"parent-child","created_at":"2026-02-18T17:23:04.88191-05:00","created_by":"Cameron Smith"},{"issue_id":"data-rg1.4","depends_on_id":"data-rg1.1","type":"blocks","created_at":"2026-02-18T17:23:57.425759-05:00","created_by":"Cameron Smith"},{"issue_id":"data-rg1.4","depends_on_id":"data-rg1.2","type":"blocks","created_at":"2026-02-18T17:24:02.370434-05:00","created_by":"Cameron Smith"}]}
{"id":"data-rg1.5","title":"Evaluate orchestration layer","description":"Evaluate whether a dedicated orchestrator is needed and which one.\n\nOptions: Dagster, Flyte, SQLMesh-only (if SQLMesh proves sufficient), or no orchestrator.\n\nThis depends on findings from the dlt and SQLMesh evaluations — which tools survived and how they compose determines what orchestration needs remain.\n\nConsider: local dev environment overhead, production scaling, integration with tools from prior evaluations.\n\nImplement directly in packages/scilake/ as production code.","status":"open","priority":2,"issue_type":"task","owner":"cameron.ray.smith@gmail.com","created_at":"2026-02-18T17:23:08.513883-05:00","created_by":"Cameron Smith","updated_at":"2026-02-18T17:23:51.092749-05:00","dependencies":[{"issue_id":"data-rg1.5","depends_on_id":"data-rg1","type":"parent-child","created_at":"2026-02-18T17:23:08.5145-05:00","created_by":"Cameron Smith"},{"issue_id":"data-rg1.5","depends_on_id":"data-rg1.1","type":"blocks","created_at":"2026-02-18T17:23:57.553935-05:00","created_by":"Cameron Smith"},{"issue_id":"data-rg1.5","depends_on_id":"data-rg1.2","type":"blocks","created_at":"2026-02-18T17:24:02.521603-05:00","created_by":"Cameron Smith"},{"issue_id":"data-rg1.5","depends_on_id":"data-rg1.3","type":"blocks","created_at":"2026-02-18T17:24:06.620935-05:00","created_by":"Cameron Smith"},{"issue_id":"data-rg1.5","depends_on_id":"data-rg1.4","type":"blocks","created_at":"2026-02-18T17:24:06.756134-05:00","created_by":"Cameron Smith"}]}
2 changes: 1 addition & 1 deletion .dvc/config
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
remote = gcs
analytics = false
['remote "gcs"']
url = gs://sciexp/projects/omicsio/cas
url = gs://sciexp/projects/data/cas
credentialpath = ../.dvc-sa.json
['remote "drive"']
url = gdrive://1yS1zpTqR4w2WFjXQuV8HypMrQqJvcM-x/dvcstore
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -103,3 +103,4 @@ node_modules/

# bv (beads viewer) local config and caches
.bv/
.wrangler
6 changes: 3 additions & 3 deletions docs/_freeze.dvc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
outs:
- md5: 481f00b14626c9c558a990f58cc704f7.dir
size: 9160
nfiles: 2
- md5: d751713988987e9331980363e24189ce.dir
size: 0
nfiles: 0
hash: md5
path: _freeze
3 changes: 3 additions & 0 deletions docs/_quarto.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
project:
type: website
output-dir: _site
render:
- "**/*.qmd"
- "!notes/"
preview:
port: 7779

Expand Down
Loading
Loading