Skip to content
/ Kiwi Public

Fast regex-based protein digestion and unique peptides finder tool

License

Notifications You must be signed in to change notification settings

Labo-MAB/Kiwi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kiwi: a fast regex-based protein digestion tool

Author

Francis Bourassa (Francis-B)

Description

Kiwi is a python tool that performs a proteolytic cleavage on proteins from a given fasta and that can determine which obtained peptides are unique.

The digestion can be tuned with the following parameters:

  • Enzyme used to perform the digestion.
  • Minimal and maximal length of the peptide sequences;
  • Maximal molecular mass of the peptide sequences;
  • Number of miscleavages allowed;

Installation

To install Kiwi as a module, go the to root directory of the repository and run:

pip install ./

If you wish to modify the scripts, you can also install it as an editable module:

pip install -e ./

Instructions

Once install, kiwi can be used via its API:

from kiwi.digestion import Experiment

fasta_file = '/path/to/file.fasta'

# Load fasta and create experiment with default parameters
experiment = Experiment(fasta_file)

# Change parameters
experiment.set_min_length(<int>)
experiment.set_max_length(<int>)
experiment.set_max_mass(<float>)
experiment.set_max_miscleavages(<int>)
experiment.set_outdir(<filepath>)
experiment.set_enzyme(<str>)  # Implemented enzymes can be found in enzyme.py

# Run
experiment.cleave_proteins()
experiment.check_sequences_uniqueness()
experiment.write()  # write the result into a file

# Access the list of sequences
experiment.peptides

By default, the peptide sequences returned are at least 7 amino acids long, have a maximum of 1 miscleavage, have a molecular mass under 4600 dalton and result from a tryptic digestion.

If no directory is passed in digestion.write(), the file is automatically saved as /path/to/file.fasta_digestedPeptides.csv.

Or run via terminal:

kiwi /path/to/file.fasta

To show the help message:

$ kiwi --h

  usage: kiwi [-h] [-l] [-M] [-m] [-a] [-e] [-o] </path/to/file.fasta>

  Digest proteins of a given fasta file

  positional arguments:
    </path/to/file.fasta>

  optional arguments:
    -h, --help            show this help message and exit
    -l , --length         minimal length of peptide sequences (default: 7)
    -M , --miscleavages   maximum of miscleavages allowed (default: 1)
    -m , --mass           maximal molecular mass of peptide sequences (default: 4600 dalton)
    -u , --unique         the list returned will contain a flag to inform if the peptides are unique or not
    -e , --enzyme         enzyme used to perform digestion (default: trypsin)
    -o , --output         output directory (default: /path/to/file.fasta_digestedPeptides.csv)

Note on unique peptides

This script was mainly developed has a tool for mass spectrometry database preparation. In this context, we consider a peptide sequence as unique if no other proteins yield the exact same sequence after enzymatic digestion.

So, for example, if two given proteins yield respectively the following peptides after a tryptic digestion:

  1. EIQILLR
  2. APELDFGEIQILLR

Even if the first peptide can be found in the second one, it is still consider has unique.

Requirements

python >= 3.6

About

Fast regex-based protein digestion and unique peptides finder tool

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages