Skip to content

Conversation

@zlskidmore
Copy link
Collaborator

This should allow the rnaseq.cwl workflow be input agnostic in terms of input being sam/cram/bam. I've also incorporated the suggestion from @jasonwalker80 on using RevertSam which was not in here before.

The best solution I could come up with was to force conversion to bam format right at the start. If you were instead to try and pass SAM/CRAM/BAM to RevertSam the output file and more importantly the extension would be incorrect (though if someone can figure this out maybe that's a better solution?).

At any rate as it is now samtools is run and just converts to bam so while there are 3 input types only 1 output type averting this headache alluded to above.

Side note mgibio has 2 samtools docker images, updated at the same time. I don't know which one we use so if this is accepted those lines will have to be modified (right now it uses zlskidmore/samtools)

@zlskidmore zlskidmore changed the base branch from cram_support to master March 19, 2019 20:56
cram_reference: cram_reference
out: [bam_file]
revert_bam:
run: ../tools/revert_input.cwl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could revert_input.cwl be run on CRAMs directly? If so seems like input_to_bam could be removed. If not, seems like it should be named revert_bam.cwl instead of revert_input.cwl to denote its specificity.

(Is this operation time and space-efficient enough that we're willing to do it on all inputs instead of only those that need it?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is the specificity, the tool in revert_input.cwl can be run on anything but you would have to use the input variable, currently bam, to grab the appropriate out file , basically would require some javascript i expect which i try to avoid in cwl. I think renaming to revert_bam.cwl is preferable from my perspective.

The flexibility in the PR does come at a computational cost for sure. I think we should run revert_input.cwl either way though so the cost is just converting whatever input to bam

@chrisamiller
Copy link
Collaborator

Does @tmooney's recent code - that allows for Fastq or Bam input to the alignment workflow - provide a path forward here? #741

Seems like could be possible to expand the sequence type to have 4 input types instead of 2:

unaligned_bam
fastq
aligned_bam
aligned_cram

Then we would add some downstream logic in the alignment steps to run RevertSam if needed. That would apply in (I think) just three different places: dna alignment, rnaseq alignment, bisulfite alignment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants