Conversation
augur subsample is built to replace the simple augur filter call with a command that supports more configuration options. Note that this is a breaking change and the old configuration will no longer work.
|
Locally, I did a test run with the full input and confirmed the output messages are the same as before with an extra validation line: |
|
Also confirmed that |
j23414
left a comment
There was a problem hiding this comment.
Approving for:
- concordance with WNV sampling (also being used by WA-DOH)
- these changes do not break anything in the existing pipeline
- the mumps pipeline is simple enough (only two) that the sampling config is not unreasonably long or tedious to edit
Local testing of the following does indeed work:
mkdir phylo-out
nextstrain run mumps@victorlin/use-augur-subsample phylogenetic phylo-out| north-america: | ||
| samples: | ||
| all: | ||
| min_length: 8000 |
There was a problem hiding this comment.
[non-blocking commentary]
Okay, I can see how this approach makes every filter parameter explicit in the config.yaml. I'm still not a fan of the duplication (e.g., identical settings for min_length, group_by, max_sequences, etc.) under each build scope (north-america and global). But since there are only two builds here (compared to dengue with 10, norovirus with 14, and flu potentially with 3×8), I can live with it for mumps.
I could see an improvement by using something like YAML anchors at some point in the future:
There was a problem hiding this comment.
I could imagine someone adding "SH" gene builds (would double the builds to 4) but as of now, we only do full genome.
There was a problem hiding this comment.
After using YAML anchors in nextstrain/rsv#103, I don't think it's great and am seeking alternatives in nextstrain/public#27.
| metadata = "data/metadata.tsv", | ||
| exclude = resolve_config_path(config["filter"]["exclude"]), | ||
| include = resolve_config_path(config["filter"]["include"]), | ||
| config = "results/run_config.yaml", |
There was a problem hiding this comment.
[non-blocking commentary]
Will this no longer detect changes to the exclude.txt or include.txt files?
Perhaps this is a moot point — I’m not sure how often people simply update the exclude list and expect to rerun from cache, versus just forcing a rerun each time anyway.
There was a problem hiding this comment.
Yes it will no longer detect changes, this is a good point. I mentioned this in nextstrain/pathogen-repo-guide#98, and I think the functionality could potentially be brought back with a workflow-level helper function that dynamically adds inputs.
|
[non-blocking optional task] Could also update subsampling in the nextclade workflow for concordance to phylogenetic mumps/nextclade/defaults/config.yaml Lines 20 to 24 in b06a43d |
|
I'll wait for a decision in nextstrain/public#27 before continuing here. |
Description of proposed changes
augur subsample is built to replace the simple augur filter call with a command that supports more configuration options.
Note that this is a breaking change and the old configuration will no longer work.
Related issue(s)
Closes #44
Checklist
Update changelogno changelog, might be good to start one