Reformat sample_calc #60

dltamayo · 2025-10-14T16:22:06Z

Moving towards de-hardcoding of metadata fields

Merge PR

- using metadata in JSON format instead of hard-coded list access - still hard-coding metadata access though - switched to pandas for outputting instead of csv - removed old code Eventual goal is to have any combination of clinical metadata on a patient/sample level be used for clustering/grouping

Header will now be included in aggregated .csvs, instead of hard coded within notebook.

Copilot

Pull Request Overview

Refactors sample calculation workflow to produce per-sample CSVs with richer metadata, then aggregates them for plotting, moving away from hard‑coded column definitions. Introduces a new SAMPLE_AGGREGATE process and updates downstream notebook to align with renamed metadata fields; also adds dynamic CPU allocation to TCRDIST3.

Introduce per-sample output file naming (stats/, vdj/) and new aggregation step replacing prior collectFile usage.
Update analysis notebook to rely on CSV headers instead of hard-coded column name lists and rename metadata fields (patient_id -> subject_id, sample_id -> sample).
Add dynamic CPU allocation logic to TCRDIST3 process.

Reviewed Changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
subworkflows/local/sample.nf	Replaces collectFile outputs with collection + aggregation processes; updates inputs to plotting step.
notebooks/sample_stats_template.qmd	Changes CSV reading strategy and metadata field names; adjusts plotting hover fields and grouping.
modules/local/sample/tcrdist3.nf	Adds dynamic cpus directive based on task.memory.
modules/local/sample/sample_calc.nf	Changes output file paths/name patterns and serializes sample metadata as JSON.
modules/local/sample/sample_aggregate.nf	New process to concatenate multiple per-sample CSVs into aggregated files.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

modules/local/sample/sample_calc.nf

Copilot · 2025-10-14T19:17:11Z

modules/local/sample/tcrdist3.nf

+        if (task.memory > 256.GB)
+            return 16 * task.attempt
+        else if (task.memory > 64.GB)
+            return 8 * task.attempt
+        else if (task.memory > 4.GB)
+            return 4 * task.attempt
+        else
+            return 2 * task.attempt
+        }


The cpus directive depends on task.memory, which itself is derived later by the memory block; this circular dependency can result in task.memory being undefined or default when cpus is evaluated. Derive both cpu and memory from the same underlying metric (e.g., count_table.size()) or compute memory first in a variable and base cpus on that variable instead of task.memory.

Suggested change

if (task.memory > 256.GB)

return 16 * task.attempt

else if (task.memory > 64.GB)

return 8 * task.attempt

else if (task.memory > 4.GB)

return 4 * task.attempt

else

return 2 * task.attempt

}

def sz = count_table.size()

def mb = 1024 * 1024

if (sz > 26 * mb)

return 16 * task.attempt

else if (sz > 20 * mb)

return 8 * task.attempt

else if (sz > 10 * mb)

return 4 * task.attempt

else

return 2 * task.attempt

}

subworkflows/local/sample.nf

modules/local/sample/sample_aggregate.nf

notebooks/sample_stats_template.qmd

dimalvovs

setting to 'Request changes' avoid being unnoticed, but feel free to create additional issues to address if we need it merged now;
nf-test test minimal example runs without errors, nice!
nextflow lint main.nf returns "❌ 4 files had 21 errors", most errors are really simple to fix, the benefit besides having better code formatting and standardization are beautiful flow charts such as below. Can be done now or if preferred, with a dedicated refactoring feature.
in the same refactoring category, I could not find the versions.yml output in the minimal example test results;
skimmed through Copilot comments and left mine where applicable
regarding the resource allocation, it does seem to work, run this gist https://gist.github.com/dimalvovs/3b1853effe9105c96ca40ac3e8886332
that produces these outputs:

 N E X T F L O W   ~  version 25.04.8

Launching `main.nf` [loquacious_kare] DSL2 - revision: 39f7d81a10

executor >  local (12)
[d4/613c6c] MAKE_TEST_FILES (3) [100%] 6 of 6 ✔
[47/10b6db] TCRDIST3_MATRIX (6) [100%] 6 of 6 ✔
cpus:8 memory:1.3 GB size:20480
cpus:2 memory:163.8 MB size:4096
cpus:4 memory:655.4 MB size:10240
cpus:8 memory:5.1 GB size:30720
cpus:2 memory:409.6 MB size:2048
cpus:4 memory:655.4 MB size:6144

and also is susceptible to restricting resources from the nextflow.config:

 N E X T F L O W   ~  version 25.04.8
Launching `main.nf` [friendly_angela] DSL2 - revision: 39f7d81a10

[17/8f8b22] MAKE_TEST_FILES (6) [100%] 6 of 6 ✔
[77/703366] TCRDIST3_MATRIX (6) [100%] 6 of 6 ✔
cpus:1 memory:409.6 MB size:2048
cpus:1 memory:1 GB size:20480
cpus:1 memory:1 GB size:30720
cpus:1 memory:163.8 MB size:4096
cpus:1 memory:655.4 MB size:10240
cpus:1 memory:655.4 MB size:6144

dltamayo and others added 5 commits October 2, 2025 11:19

Merge pull request #59 from KarchinLab/main

fec6545

Merge PR

Test dynamic cpu allocation

6bf098e

Debug gene_fam csv

a2ce8cb

Reformat sample_calc, _stats, notebook

87daed7

Header will now be included in aggregated .csvs, instead of hard coded within notebook.

dltamayo requested a review from dimalvovs October 14, 2025 19:06

dimalvovs requested a review from Copilot October 14, 2025 19:14

Copilot AI reviewed Oct 14, 2025

View reviewed changes

dimalvovs requested changes Oct 20, 2025

View reviewed changes

hweej mentioned this pull request Oct 23, 2025

Feature/add unit test scaffolding #61

Merged

dimalvovs self-requested a review November 7, 2025 22:17

dimalvovs approved these changes Nov 7, 2025

View reviewed changes

dltamayo merged commit 5692e2b into main Dec 4, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reformat sample_calc #60

Reformat sample_calc #60

Uh oh!

dltamayo commented Oct 14, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Oct 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dimalvovs left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Reformat sample_calc #60

Reformat sample_calc #60

Uh oh!

Conversation

dltamayo commented Oct 14, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dimalvovs left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dimalvovs left a comment •

edited

Loading