Skip to content
SANJAY SG edited this page Jul 2, 2025 · 1 revision

ProtPeptigram Documentation

CI/CD PyPI version Python 3.10+ License: MIT

Table of Contents

  1. Overview
  2. Features
  3. Installation
  4. Quick Start
  5. API Reference
  6. Data Format Requirements
  7. Examples
  8. Google Colab Integration
  9. Contributing
  10. Citation
  11. License

Overview

ProtPeptigram is a comprehensive visualization platform specifically designed for mapping immunopeptides to their source proteins across multiple biological samples. This specialized tool enables researchers in immunopeptidomics to identify peptide coverage patterns, analyze density distributions, and compare peptide presentations between different experimental conditions.

Scientific Applications

  • Immunopeptidomics Research: Visualize HLA-presented peptides
  • Antigen Processing Studies: Analyze peptide coverage patterns on source proteins
  • Comparative Analysis: Compare peptide presentations across experimental conditions
  • Biomarker Discovery: Identify regions with dense peptide coverage
  • Vaccine Development: Analyze epitope distributions on target proteins

Key Capabilities

ProtPeptigram addresses the critical need for specialized visualization tools in immunopeptidomics research, enabling scientists to map peptides identified from mass spectrometry experiments back to their source proteins with precise positional information and intensity data.


Features

Core Visualization Features

  • Intuitive Peptide Mapping: Map peptides to their source proteins with detailed positional information
  • Multi-Sample Support: Compare peptide presentation across different experimental conditions
  • Intensity-Based Coloring: Visualize peptide abundance with customizable color schemes
  • Automatic Highlighting: Identify regions of interest with dense peptide coverage
  • Publication-Quality Outputs: Generate high-resolution figures suitable for scientific publications
  • Customizable Visualizations: Adjust color schemes, highlighting, and display options

Advanced Analysis Features

  • Coverage Pattern Analysis: Identify hotspots and cold spots in peptide presentation
  • Cross-Sample Comparison: Statistical comparison of peptide presentations
  • Protein Filtering: Focus on top proteins by peptide count or specific protein lists
  • Intensity Thresholding: Filter low-abundance peptides for cleaner visualizations
  • Batch Processing: Process multiple proteins simultaneously

Installation

System Requirements

  • Python: ≥ 3.8 (recommended: 3.10+)
  • Operating System: Cross-platform (Windows, macOS, Linux)
  • Memory: 4GB RAM minimum (8GB recommended for large datasets)

Dependencies

pandas >= 1.3.0
matplotlib >= 3.5.0
numpy >= 1.21.0
Biopython >= 1.78
rich >= 10.0.0

Installation Methods

Method 1: From PyPI (Recommended)

pip install protpeptigram

Method 2: From TestPyPI

pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ protpeptigram

Method 3: From Source (Development)

git clone https://github.com/Sanpme66/ProtPeptigram.git
cd ProtPeptigram
pip install -e .

Method 4: Using Conda

conda create -n protpeptigram python=3.10
conda activate protpeptigram
pip install protpeptigram

Quick Start

Command Line Interface

Basic Usage

# Minimal command with default settings
protpeptigram -i data/peptides.csv -f data/proteome.fasta -o output_directory

Advanced Usage Examples

# Visualize top 10 proteins by peptide count
protpeptigram -i data/peptides.csv -f data/proteome.fasta -o output_dir -tp 10

# Visualize specific proteins from a list
protpeptigram -i data/peptides.csv -f data/proteome.fasta -o output_dir -pl protein_list.txt

# Apply intensity threshold to filter low-abundance peptides
protpeptigram -i data/peptides.csv -f data/proteome.fasta -o output_dir -th 1000

# Comprehensive analysis with multiple filters
protpeptigram -i data/peptides.csv -f data/proteome.fasta -o output_dir \
  -tp 20 -th 500 --filter-contaminants --min-samples 3

Python API Usage

Basic Workflow

from ProtPeptigram.DataProcessor import PeptideDataProcessor
from ProtPeptigram.viz import ImmunoViz

# Step 1: Initialize data processor
processor = PeptideDataProcessor()

# Step 2: Load data files
processor.load_peaks_data("data/peptides.csv")
processor.load_protein_sequences("data/proteome.fasta")

# Step 3: Process and filter data
formatted_data = processor.filter_and_format_data(
    filter_contaminants=True,
    intensity_threshold=1000,
    min_samples=2
)

# Step 4: Create visualizations
viz = ImmunoViz(formatted_data)
fig, axes = viz.plot_peptigram(
    protein_ids=["P20152", "P32261"],
    group_by="Sample",
    color_by="protein",
    title="HLA Peptide Visualization"
)

# Step 5: Save high-quality output
fig.savefig("protein_visualization.png", dpi=300, bbox_inches="tight")

API Reference

Class: PeptideDataProcessor

Main class for processing and preparing immunopeptide data.

Constructor

PeptideDataProcessor(config=None)

Methods

load_peaks_data(filepath)

Description: Load peptide data from PEAKS software output.

Parameters:

  • filepath (str): Path to CSV file containing peptide data
  • encoding (str, optional): File encoding (default: 'utf-8')

Returns: None (data stored internally)

Example:

processor.load_peaks_data("peptides.csv")
load_protein_sequences(fasta_path)

Description: Load protein sequences from FASTA file.

Parameters:

  • fasta_path (str): Path to FASTA file containing protein sequences
  • format (str, optional): Sequence format (default: 'fasta')

Returns: Dictionary of protein sequences

filter_and_format_data(**kwargs)

Description: Apply filters and format data for visualization.

Parameters:

  • filter_contaminants (bool, default=True): Remove contaminant proteins
  • intensity_threshold (float, default=0): Minimum peptide intensity
  • min_samples (int, default=1): Minimum number of samples per peptide
  • min_peptide_length (int, default=8): Minimum peptide length
  • max_peptide_length (int, default=15): Maximum peptide length

Returns: Formatted DataFrame ready for visualization

Class: ImmunoViz

Visualization class for creating immunopeptide mappings.

Constructor

ImmunoViz(data, config=None)

Parameters:

  • data (DataFrame): Processed peptide data
  • config (dict, optional): Visualization configuration

Methods

plot_peptigram(protein_ids, **kwargs)

Description: Create peptide coverage visualization for specified proteins.

Parameters:

  • protein_ids (list): List of protein accession numbers
  • group_by (str, optional): Grouping variable ('Sample', 'Condition', etc.)
  • color_by (str, optional): Coloring scheme ('protein', 'intensity', 'sample')
  • title (str, optional): Plot title
  • figsize (tuple, optional): Figure size (width, height)
  • show_sequence (bool, default=True): Display protein sequence
  • highlight_regions (list, optional): Regions to highlight

Returns: (Figure, Axes) matplotlib objects

plot_coverage_heatmap(protein_ids, **kwargs)

Description: Generate heatmap showing peptide coverage across samples.

Parameters:

  • protein_ids (list): Protein accession numbers
  • normalize (bool, default=True): Normalize intensities
  • clustering (bool, default=False): Apply hierarchical clustering

Returns: Heatmap figure object

plot_intensity_distribution(protein_id, **kwargs)

Description: Plot intensity distribution for peptides from a specific protein.

Parameters:

  • protein_id (str): Single protein accession number
  • bins (int, default=50): Number of histogram bins
  • log_scale (bool, default=True): Use logarithmic scale

Returns: Distribution plot figure


Data Format Requirements

Input Data Specifications

Peptide Data (CSV Format)

ProtPeptigram accepts peptide data in CSV format from PEAKS software with the following required columns:

Column Name Description Data Type Example
Peptide Amino acid sequence String "KFDHLGFK"
Protein Accession UniProt accession number String "P20152"
Start Start position in protein Integer 45
End End position in protein Integer 52
Sample_1_Intensity Intensity in sample 1 Float 1500.5
Sample_2_Intensity Intensity in sample 2 Float 2300.1
... Additional sample columns Float ...

Protein Sequences (FASTA Format)

Standard FASTA format files containing protein sequences:

>sp|P20152|VIME_HUMAN Vimentin OS=Homo sapiens
MSTRSVSSSSYRRMFGGPGTASRPSSSRSYVTTSTRTYSLGSALRPSTSRSLYASSPGGVY
ATRSSAVRLRSSVPGVRLLQDSVDFSLADAINTEFKNTRTNEKVELQELNDRFANYIDKVRF
...

Data Quality Guidelines

  • Peptide Length: Typically 8-15 amino acids for HLA Class I peptides
  • Intensity Values: Should be positive numerical values
  • Missing Values: Handled automatically (treated as zero intensity)
  • Contaminants: Automatically filtered using common contaminant database

Examples

Example 1: Basic Visualization

from ProtPeptigram.DataProcessor import PeptideDataProcessor
from ProtPeptigram.viz import ImmunoViz

# Load and process data
processor = PeptideDataProcessor()
processor.load_peaks_data("hla_peptides.csv")
processor.load_protein_sequences("human_proteome.fasta")

data = processor.filter_and_format_data(intensity_threshold=1000)

# Create basic visualization
viz = ImmunoViz(data)
fig, _ = viz.plot_peptigram(
    protein_ids=["P04406"],  # Glyceraldehyde-3-phosphate dehydrogenase
    title="GAPDH Peptide Coverage"
)

fig.show()

Example 2: Multi-Sample Comparison

# Compare peptide presentation across cancer vs normal samples
fig, axes = viz.plot_peptigram(
    protein_ids=["P53999", "P04406", "P60174"],
    group_by="Condition",
    color_by="intensity",
    title="Peptide Presentation: Cancer vs Normal",
    figsize=(15, 10)
)

# Add statistical annotations
viz.add_significance_markers(axes, method="wilcoxon")

Example 3: Coverage Heatmap Analysis

# Generate coverage heatmap for immune-relevant proteins
immune_proteins = ["P04406", "P53999", "P60174", "P20152", "P32261"]

heatmap_fig = viz.plot_coverage_heatmap(
    protein_ids=immune_proteins,
    normalize=True,
    clustering=True,
    title="Protein Coverage Across Samples"
)

heatmap_fig.savefig("coverage_heatmap.pdf", dpi=300)

Example 4: Batch Processing Multiple Proteins

# Process top 50 proteins by peptide count
top_proteins = processor.get_top_proteins(n=50)

for protein_id in top_proteins:
    fig, _ = viz.plot_peptigram(
        protein_ids=[protein_id],
        color_by="sample",
        title=f"Peptide Coverage: {protein_id}"
    )
    
    fig.savefig(f"output/{protein_id}_coverage.png", dpi=300, bbox_inches="tight")
    plt.close(fig)  # Free memory

Example 5: Custom Color Schemes

# Apply custom color scheme for publication
custom_colors = {
    'Sample_A': '#E31A1C',
    'Sample_B': '#1F78B4', 
    'Sample_C': '#33A02C'
}

fig, _ = viz.plot_peptigram(
    protein_ids=["P04406"],
    color_by="sample",
    custom_colors=custom_colors,
    show_sequence=True,
    highlight_regions=[(50, 100), (200, 250)]
)

Google Colab Integration

You can quickly try out ProtPeptigram on Google Colab without any local installation:

Open in Colab

Colab Setup Instructions

  1. Click the "Open in Colab" badge above
  2. Install ProtPeptigram in the first cell:
    !pip install protpeptigram
  3. Upload your data files using the file upload widget
  4. Follow the tutorial for step-by-step analysis

Sample Data Access

Example datasets are available for testing:


Contributing

We welcome contributions from the immunopeptidomics community! Here's how you can contribute:

Development Setup

  1. Fork the repository on GitHub
  2. Clone your fork locally:
    git clone https://github.com/yourusername/ProtPeptigram.git
    cd ProtPeptigram
  3. Create a virtual environment:
    python -m venv env
    source env/bin/activate  # On Windows: env\Scripts\activate
  4. Install in development mode:
    pip install -e .[dev]

Contribution Guidelines

  • Code Style: Follow PEP 8 guidelines
  • Documentation: Include docstrings for all functions
  • Testing: Add unit tests for new features
  • Commit Messages: Use descriptive commit messages

Types of Contributions

  • Bug Reports: Submit detailed bug reports with reproducible examples
  • Feature Requests: Suggest new visualization types or analysis methods
  • Code Contributions: Implement new features or fix bugs
  • Documentation: Improve documentation and examples
  • Data Format Support: Add support for new mass spectrometry software outputs

Submitting Changes

  1. Create a feature branch:
    git checkout -b feature/amazing-feature
  2. Make your changes and commit them:
    git commit -m 'Add some amazing feature'
  3. Push to your branch:
    git push origin feature/amazing-feature
  4. Open a Pull Request on GitHub

Citation

If you use ProtPeptigram in your research, please cite our work:

@article{krishna2024protpeptigram,
  title={ProtPeptigram: Visualization tool for mapping peptides to source proteins},
  author={Krishna, Sanjay and Li, Chen and others},
  journal={bioRxiv},
  year={2024},
  url={https://www.monash.edu/research/compomics/},
  doi={10.1101/2024.xxx.xxx}
}

Related Publications


Acknowledgments

  • Developed at: Li Lab/Purcell Lab, Monash University, Australia
  • Inspiration: The critical need for specialized visualization tools in immunopeptidomics research
  • Community: Thanks to the immunopeptidomics research community for feedback and testing

Institutional Affiliations

  • Monash University - Department of Biochemistry and Molecular Biology
  • Biomedicine Discovery Institute - Infection and Immunity Program

Support and Contact

Getting Help

Contact Information

Reporting Issues

When reporting issues, please include:

  • Python version and operating system
  • ProtPeptigram version
  • Complete error message and traceback
  • Minimal reproducible example
  • Sample data (if possible)

License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License Summary

  • ✅ Commercial use
  • ✅ Modification
  • ✅ Distribution
  • ✅ Private use
  • ❌ Liability
  • ❌ Warranty

Version History

v1.0.0 (Current Release)

  • Initial public release
  • Core peptide mapping functionality
  • Multi-sample visualization support
  • Command-line interface
  • Python API
  • Google Colab integration

Upcoming Features

  • Support for additional MS software outputs (MaxQuant, Proteome Discoverer)
  • Statistical analysis modules
  • Interactive web interface
  • Integration with protein databases (UniProt, PDB)
  • Machine learning-based peptide prediction

Documentation last updated: 02/07/2025 ProtPeptigram Version: 1.0.0
Developed by Li Lab/Purcell Lab, Monash University

Clone this wiki locally