-
Notifications
You must be signed in to change notification settings - Fork 2
Reformat sample_calc #60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Merge PR
- using metadata in JSON format instead of hard-coded list access - still hard-coding metadata access though - switched to pandas for outputting instead of csv - removed old code Eventual goal is to have any combination of clinical metadata on a patient/sample level be used for clustering/grouping
Header will now be included in aggregated .csvs, instead of hard coded within notebook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Refactors sample calculation workflow to produce per-sample CSVs with richer metadata, then aggregates them for plotting, moving away from hard‑coded column definitions. Introduces a new SAMPLE_AGGREGATE process and updates downstream notebook to align with renamed metadata fields; also adds dynamic CPU allocation to TCRDIST3.
- Introduce per-sample output file naming (stats/, vdj/) and new aggregation step replacing prior collectFile usage.
- Update analysis notebook to rely on CSV headers instead of hard-coded column name lists and rename metadata fields (patient_id -> subject_id, sample_id -> sample).
- Add dynamic CPU allocation logic to TCRDIST3 process.
Reviewed Changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| subworkflows/local/sample.nf | Replaces collectFile outputs with collection + aggregation processes; updates inputs to plotting step. |
| notebooks/sample_stats_template.qmd | Changes CSV reading strategy and metadata field names; adjusts plotting hover fields and grouping. |
| modules/local/sample/tcrdist3.nf | Adds dynamic cpus directive based on task.memory. |
| modules/local/sample/sample_calc.nf | Changes output file paths/name patterns and serializes sample metadata as JSON. |
| modules/local/sample/sample_aggregate.nf | New process to concatenate multiple per-sample CSVs into aggregated files. |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| if (task.memory > 256.GB) | ||
| return 16 * task.attempt | ||
| else if (task.memory > 64.GB) | ||
| return 8 * task.attempt | ||
| else if (task.memory > 4.GB) | ||
| return 4 * task.attempt | ||
| else | ||
| return 2 * task.attempt | ||
| } |
Copilot
AI
Oct 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cpus directive depends on task.memory, which itself is derived later by the memory block; this circular dependency can result in task.memory being undefined or default when cpus is evaluated. Derive both cpu and memory from the same underlying metric (e.g., count_table.size()) or compute memory first in a variable and base cpus on that variable instead of task.memory.
| if (task.memory > 256.GB) | |
| return 16 * task.attempt | |
| else if (task.memory > 64.GB) | |
| return 8 * task.attempt | |
| else if (task.memory > 4.GB) | |
| return 4 * task.attempt | |
| else | |
| return 2 * task.attempt | |
| } | |
| def sz = count_table.size() | |
| def mb = 1024 * 1024 | |
| if (sz > 26 * mb) | |
| return 16 * task.attempt | |
| else if (sz > 20 * mb) | |
| return 8 * task.attempt | |
| else if (sz > 10 * mb) | |
| return 4 * task.attempt | |
| else | |
| return 2 * task.attempt | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- setting to 'Request changes' avoid being unnoticed, but feel free to create additional issues to address if we need it merged now;
nf-test testminimal example runs without errors, nice!nextflow lint main.nfreturns "❌ 4 files had 21 errors", most errors are really simple to fix, the benefit besides having better code formatting and standardization are beautiful flow charts such as below. Can be done now or if preferred, with a dedicated refactoring feature.
- in the same refactoring category, I could not find the
versions.ymloutput in the minimal example test results; - skimmed through Copilot comments and left mine where applicable
- regarding the resource allocation, it does seem to work, run this gist https://gist.github.com/dimalvovs/3b1853effe9105c96ca40ac3e8886332
that produces these outputs:
N E X T F L O W ~ version 25.04.8
Launching `main.nf` [loquacious_kare] DSL2 - revision: 39f7d81a10
executor > local (12)
[d4/613c6c] MAKE_TEST_FILES (3) [100%] 6 of 6 ✔
[47/10b6db] TCRDIST3_MATRIX (6) [100%] 6 of 6 ✔
cpus:8 memory:1.3 GB size:20480
cpus:2 memory:163.8 MB size:4096
cpus:4 memory:655.4 MB size:10240
cpus:8 memory:5.1 GB size:30720
cpus:2 memory:409.6 MB size:2048
cpus:4 memory:655.4 MB size:6144
and also is susceptible to restricting resources from the nextflow.config:
N E X T F L O W ~ version 25.04.8
Launching `main.nf` [friendly_angela] DSL2 - revision: 39f7d81a10
[17/8f8b22] MAKE_TEST_FILES (6) [100%] 6 of 6 ✔
[77/703366] TCRDIST3_MATRIX (6) [100%] 6 of 6 ✔
cpus:1 memory:409.6 MB size:2048
cpus:1 memory:1 GB size:20480
cpus:1 memory:1 GB size:30720
cpus:1 memory:163.8 MB size:4096
cpus:1 memory:655.4 MB size:10240
cpus:1 memory:655.4 MB size:6144
Moving towards de-hardcoding of metadata fields