bm-tk

Simple pipeline for predicting bacterial base modification from PacBio HiFi sequencing data with kinetics tags.

Currently the output design is to store the output BAMs alongside the input files, with the prefix fibertools_predict.{input_bam}. We implement a custom check for whether output already exists, and filter any inputs which have output files in the expected location. These will be logged by the pipeline. This behaviour can be disabled by setting --clobber true.

Currently, the pipeline will

Filter out BAMs which seem irrelevant by name (contain fail, unassigned, subread, scrap, fibertools_preidct), or which have existing output files.
Filter out any BAMS which do not contain the required kinetics tags (CHECK_KINETICS)
Predict 6ma base modification using fibertools (PREDICT_FIBERTOOLS)
Extract modifications to table using a custom perl script (EXTRACT_CALLS). This currently only extracts modifications with a probability > 240 (~0.94). Currently this is a fixed threshold and cannot be changed. This extraction is done by default, but is optional. Disable by setting --extract_calls false.

The default install of fibertools is from conda, and will not support use of the GPU. If you want to use GPU to improve speed of prediction, refer to the installation instructions in the fibertools documentation.

Installation notes

Installing fibertools with GPU features requires cmake, FindBin.pm, git.

samtools should be available in the environment you launch the pipeline from.

For our local use, currently it has been fast enough to run fibertools without the GPU, primarily due to time spent pending on the GPU partition. You may find it worthwhile install with GPU features if your GPUs are less in demand.

Running using slurm and micromamba

This will show how to run the pipeline using micromamba environments. We will create a micromamba environment with nextflow installed, then nextflow will automatically create the environment required to run the processes. If the machines you run the pipeline on do not have internet access, see the later section on running without internet access

Install nextflow

Run

micromamba create -n nextflow nextflow conda samtools

We are installing conda within the environment, as nextflow needs the conda binary to activate and deactivate environments. samtools is installed as each file gets checked for kinetics tags locally, rather than submitted as a job, and so runs in the nextflow environment.

Pull the pipeline

(Optional). Nextflow can take a local copy of the pipeline to run. If your compute nodes have internet access, this step isn't strictly necessary.

nextflow pull apduncan/bm-tk -r main

This will pull the most recent commit to the main branch. You could also specify a commit hash.

Move to directory where you will run the pipeline

Move to whichever directory you want pipeline logs and configuration to be kept in. Unlike many nextflow pipelines, output files will not be in this directory. Output will be in the same location as the input bams.

Customise nextflow.config profile

This step isn't neccessary if you are in our group, the default should work.

nextflow.config specifies profiles which give details for the submission system. It has defaults which work for our group, if you are using this elsewhere you will need to customise this. Take a copy of the default config

curl https://raw.githubusercontent.com/apduncan/bm-tk/refs/heads/main/nextflow.config > nextflow.config

You can either customise or copy the nbi_slurm profile. If you are also using slurm, it should be enough to specify your partition names in the queues fields.

Run pipeline

Activate your nextflow environment

micromamba activate nextflow

Then run the pipeline

nextflow run apduncan/bm-tk \
-profile nbi_slurm \
-work-dir /path/to/scratch \
-with-report \
-r main \
--bams "/glob/to/**/find*.bam"

Do this on a node where it is okay to start long running jobs interactively, or put the above in a batch submission script.

The pipeline should then run and produce your BAMs with predicted methylation.

Running without internet access

The main obstacle to running without internet access is that nextflow will not be able to create the micromamba environment. However, we can do that on a node with internet access, then provide the path to the environment.

To create the environment, run

curl https://raw.githubusercontent.com/apduncan/bm-tk/refs/heads/main/env.yaml > env.yaml && \
micromamba env create -n bmtk --file env.yaml

Find the environment path

> micromamba env list | grep bmtk
bmtk                      /home/user/micromamba/envs/bmtk

Copy that path into the conda = setting of the profile in nextflow.config, e.g. for the nbi_slurm profile:

profiles {
    conda {
        conda.enabled = true
        process.conda = "/home/kam24goz/miniforge3/envs/pbbm"
    }
    nbi_slurm {
        conda.useMicromamba = true
        process {
            conda = "/home/user/micromamba/envs/bmtk"
            executor = 'slurm'
            queue = 'ei-medium'
            memory = '2GB'
            cpus = 2
...

When you are submitting the nextflow pipeline it should use this environment. Be sure to also put export NXF_OFFLINE='true' in your submission scripts, otherwise nextflow will waste much time trying to phone home for updates.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
bin		bin
stub_data		stub_data
.gitignore		.gitignore
README.md		README.md
env.yaml		env.yaml
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bm-tk

Installation notes

Running using slurm and micromamba

Install nextflow

Pull the pipeline

Move to directory where you will run the pipeline

Customise nextflow.config profile

Run pipeline

Running without internet access

About

Uh oh!

Releases

Packages

Languages

apduncan/bm-tk

Folders and files

Latest commit

History

Repository files navigation

bm-tk

Installation notes

Running using slurm and micromamba

Install nextflow

Pull the pipeline

Move to directory where you will run the pipeline

Customise nextflow.config profile

Run pipeline

Running without internet access

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages