-
Notifications
You must be signed in to change notification settings - Fork 0
Home
- Overview
- Features
- Installation
- Quick Start
- API Reference
- Data Format Requirements
- Examples
- Google Colab Integration
- Contributing
- Citation
- License
ProtPeptigram is a comprehensive visualization platform specifically designed for mapping immunopeptides to their source proteins across multiple biological samples. This specialized tool enables researchers in immunopeptidomics to identify peptide coverage patterns, analyze density distributions, and compare peptide presentations between different experimental conditions.
- Immunopeptidomics Research: Visualize HLA-presented peptides
- Antigen Processing Studies: Analyze peptide coverage patterns on source proteins
- Comparative Analysis: Compare peptide presentations across experimental conditions
- Biomarker Discovery: Identify regions with dense peptide coverage
- Vaccine Development: Analyze epitope distributions on target proteins
ProtPeptigram addresses the critical need for specialized visualization tools in immunopeptidomics research, enabling scientists to map peptides identified from mass spectrometry experiments back to their source proteins with precise positional information and intensity data.
- Intuitive Peptide Mapping: Map peptides to their source proteins with detailed positional information
- Multi-Sample Support: Compare peptide presentation across different experimental conditions
- Intensity-Based Coloring: Visualize peptide abundance with customizable color schemes
- Automatic Highlighting: Identify regions of interest with dense peptide coverage
- Publication-Quality Outputs: Generate high-resolution figures suitable for scientific publications
- Customizable Visualizations: Adjust color schemes, highlighting, and display options
- Coverage Pattern Analysis: Identify hotspots and cold spots in peptide presentation
- Cross-Sample Comparison: Statistical comparison of peptide presentations
- Protein Filtering: Focus on top proteins by peptide count or specific protein lists
- Intensity Thresholding: Filter low-abundance peptides for cleaner visualizations
- Batch Processing: Process multiple proteins simultaneously
- Python: ≥ 3.8 (recommended: 3.10+)
- Operating System: Cross-platform (Windows, macOS, Linux)
- Memory: 4GB RAM minimum (8GB recommended for large datasets)
pandas >= 1.3.0
matplotlib >= 3.5.0
numpy >= 1.21.0
Biopython >= 1.78
rich >= 10.0.0
pip install protpeptigrampip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ protpeptigramgit clone https://github.com/Sanpme66/ProtPeptigram.git
cd ProtPeptigram
pip install -e .conda create -n protpeptigram python=3.10
conda activate protpeptigram
pip install protpeptigram# Minimal command with default settings
protpeptigram -i data/peptides.csv -f data/proteome.fasta -o output_directory# Visualize top 10 proteins by peptide count
protpeptigram -i data/peptides.csv -f data/proteome.fasta -o output_dir -tp 10
# Visualize specific proteins from a list
protpeptigram -i data/peptides.csv -f data/proteome.fasta -o output_dir -pl protein_list.txt
# Apply intensity threshold to filter low-abundance peptides
protpeptigram -i data/peptides.csv -f data/proteome.fasta -o output_dir -th 1000
# Comprehensive analysis with multiple filters
protpeptigram -i data/peptides.csv -f data/proteome.fasta -o output_dir \
-tp 20 -th 500 --filter-contaminants --min-samples 3from ProtPeptigram.DataProcessor import PeptideDataProcessor
from ProtPeptigram.viz import ImmunoViz
# Step 1: Initialize data processor
processor = PeptideDataProcessor()
# Step 2: Load data files
processor.load_peaks_data("data/peptides.csv")
processor.load_protein_sequences("data/proteome.fasta")
# Step 3: Process and filter data
formatted_data = processor.filter_and_format_data(
filter_contaminants=True,
intensity_threshold=1000,
min_samples=2
)
# Step 4: Create visualizations
viz = ImmunoViz(formatted_data)
fig, axes = viz.plot_peptigram(
protein_ids=["P20152", "P32261"],
group_by="Sample",
color_by="protein",
title="HLA Peptide Visualization"
)
# Step 5: Save high-quality output
fig.savefig("protein_visualization.png", dpi=300, bbox_inches="tight")Main class for processing and preparing immunopeptide data.
PeptideDataProcessor(config=None)Description: Load peptide data from PEAKS software output.
Parameters:
-
filepath(str): Path to CSV file containing peptide data -
encoding(str, optional): File encoding (default: 'utf-8')
Returns: None (data stored internally)
Example:
processor.load_peaks_data("peptides.csv")Description: Load protein sequences from FASTA file.
Parameters:
-
fasta_path(str): Path to FASTA file containing protein sequences -
format(str, optional): Sequence format (default: 'fasta')
Returns: Dictionary of protein sequences
Description: Apply filters and format data for visualization.
Parameters:
-
filter_contaminants(bool, default=True): Remove contaminant proteins -
intensity_threshold(float, default=0): Minimum peptide intensity -
min_samples(int, default=1): Minimum number of samples per peptide -
min_peptide_length(int, default=8): Minimum peptide length -
max_peptide_length(int, default=15): Maximum peptide length
Returns: Formatted DataFrame ready for visualization
Visualization class for creating immunopeptide mappings.
ImmunoViz(data, config=None)Parameters:
-
data(DataFrame): Processed peptide data -
config(dict, optional): Visualization configuration
Description: Create peptide coverage visualization for specified proteins.
Parameters:
-
protein_ids(list): List of protein accession numbers -
group_by(str, optional): Grouping variable ('Sample', 'Condition', etc.) -
color_by(str, optional): Coloring scheme ('protein', 'intensity', 'sample') -
title(str, optional): Plot title -
figsize(tuple, optional): Figure size (width, height) -
show_sequence(bool, default=True): Display protein sequence -
highlight_regions(list, optional): Regions to highlight
Returns: (Figure, Axes) matplotlib objects
Description: Generate heatmap showing peptide coverage across samples.
Parameters:
-
protein_ids(list): Protein accession numbers -
normalize(bool, default=True): Normalize intensities -
clustering(bool, default=False): Apply hierarchical clustering
Returns: Heatmap figure object
Description: Plot intensity distribution for peptides from a specific protein.
Parameters:
-
protein_id(str): Single protein accession number -
bins(int, default=50): Number of histogram bins -
log_scale(bool, default=True): Use logarithmic scale
Returns: Distribution plot figure
ProtPeptigram accepts peptide data in CSV format from PEAKS software with the following required columns:
| Column Name | Description | Data Type | Example |
|---|---|---|---|
Peptide |
Amino acid sequence | String | "KFDHLGFK" |
Protein Accession |
UniProt accession number | String | "P20152" |
Start |
Start position in protein | Integer | 45 |
End |
End position in protein | Integer | 52 |
Sample_1_Intensity |
Intensity in sample 1 | Float | 1500.5 |
Sample_2_Intensity |
Intensity in sample 2 | Float | 2300.1 |
... |
Additional sample columns | Float | ... |
Standard FASTA format files containing protein sequences:
>sp|P20152|VIME_HUMAN Vimentin OS=Homo sapiens
MSTRSVSSSSYRRMFGGPGTASRPSSSRSYVTTSTRTYSLGSALRPSTSRSLYASSPGGVY
ATRSSAVRLRSSVPGVRLLQDSVDFSLADAINTEFKNTRTNEKVELQELNDRFANYIDKVRF
...
- Peptide Length: Typically 8-15 amino acids for HLA Class I peptides
- Intensity Values: Should be positive numerical values
- Missing Values: Handled automatically (treated as zero intensity)
- Contaminants: Automatically filtered using common contaminant database
from ProtPeptigram.DataProcessor import PeptideDataProcessor
from ProtPeptigram.viz import ImmunoViz
# Load and process data
processor = PeptideDataProcessor()
processor.load_peaks_data("hla_peptides.csv")
processor.load_protein_sequences("human_proteome.fasta")
data = processor.filter_and_format_data(intensity_threshold=1000)
# Create basic visualization
viz = ImmunoViz(data)
fig, _ = viz.plot_peptigram(
protein_ids=["P04406"], # Glyceraldehyde-3-phosphate dehydrogenase
title="GAPDH Peptide Coverage"
)
fig.show()# Compare peptide presentation across cancer vs normal samples
fig, axes = viz.plot_peptigram(
protein_ids=["P53999", "P04406", "P60174"],
group_by="Condition",
color_by="intensity",
title="Peptide Presentation: Cancer vs Normal",
figsize=(15, 10)
)
# Add statistical annotations
viz.add_significance_markers(axes, method="wilcoxon")# Generate coverage heatmap for immune-relevant proteins
immune_proteins = ["P04406", "P53999", "P60174", "P20152", "P32261"]
heatmap_fig = viz.plot_coverage_heatmap(
protein_ids=immune_proteins,
normalize=True,
clustering=True,
title="Protein Coverage Across Samples"
)
heatmap_fig.savefig("coverage_heatmap.pdf", dpi=300)# Process top 50 proteins by peptide count
top_proteins = processor.get_top_proteins(n=50)
for protein_id in top_proteins:
fig, _ = viz.plot_peptigram(
protein_ids=[protein_id],
color_by="sample",
title=f"Peptide Coverage: {protein_id}"
)
fig.savefig(f"output/{protein_id}_coverage.png", dpi=300, bbox_inches="tight")
plt.close(fig) # Free memory# Apply custom color scheme for publication
custom_colors = {
'Sample_A': '#E31A1C',
'Sample_B': '#1F78B4',
'Sample_C': '#33A02C'
}
fig, _ = viz.plot_peptigram(
protein_ids=["P04406"],
color_by="sample",
custom_colors=custom_colors,
show_sequence=True,
highlight_regions=[(50, 100), (200, 250)]
)You can quickly try out ProtPeptigram on Google Colab without any local installation:
- Click the "Open in Colab" badge above
-
Install ProtPeptigram in the first cell:
!pip install protpeptigram
- Upload your data files using the file upload widget
- Follow the tutorial for step-by-step analysis
Example datasets are available for testing:
We welcome contributions from the immunopeptidomics community! Here's how you can contribute:
- Fork the repository on GitHub
-
Clone your fork locally:
git clone https://github.com/yourusername/ProtPeptigram.git cd ProtPeptigram -
Create a virtual environment:
python -m venv env source env/bin/activate # On Windows: env\Scripts\activate
-
Install in development mode:
pip install -e .[dev]
- Code Style: Follow PEP 8 guidelines
- Documentation: Include docstrings for all functions
- Testing: Add unit tests for new features
- Commit Messages: Use descriptive commit messages
- Bug Reports: Submit detailed bug reports with reproducible examples
- Feature Requests: Suggest new visualization types or analysis methods
- Code Contributions: Implement new features or fix bugs
- Documentation: Improve documentation and examples
- Data Format Support: Add support for new mass spectrometry software outputs
-
Create a feature branch:
git checkout -b feature/amazing-feature
-
Make your changes and commit them:
git commit -m 'Add some amazing feature' -
Push to your branch:
git push origin feature/amazing-feature
- Open a Pull Request on GitHub
If you use ProtPeptigram in your research, please cite our work:
@article{krishna2024protpeptigram,
title={ProtPeptigram: Visualization tool for mapping peptides to source proteins},
author={Krishna, Sanjay and Li, Chen and others},
journal={bioRxiv},
year={2024},
url={https://www.monash.edu/research/compomics/},
doi={10.1101/2024.xxx.xxx}
}- Developed at: Li Lab/Purcell Lab, Monash University, Australia
- Inspiration: The critical need for specialized visualization tools in immunopeptidomics research
- Community: Thanks to the immunopeptidomics research community for feedback and testing
- Monash University - Department of Biochemistry and Molecular Biology
- Biomedicine Discovery Institute - Infection and Immunity Program
- Documentation: Full API Documentation
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Primary Developer: Sanjay Krishna
- GitHub: @Sanpme66
- Project Repository: https://github.com/Sanpme66/ProtPeptigram
When reporting issues, please include:
- Python version and operating system
- ProtPeptigram version
- Complete error message and traceback
- Minimal reproducible example
- Sample data (if possible)
This project is licensed under the MIT License - see the LICENSE file for details.
- ✅ Commercial use
- ✅ Modification
- ✅ Distribution
- ✅ Private use
- ❌ Liability
- ❌ Warranty
- Initial public release
- Core peptide mapping functionality
- Multi-sample visualization support
- Command-line interface
- Python API
- Google Colab integration
- Support for additional MS software outputs (MaxQuant, Proteome Discoverer)
- Statistical analysis modules
- Interactive web interface
- Integration with protein databases (UniProt, PDB)
- Machine learning-based peptide prediction
Documentation last updated: 02/07/2025
ProtPeptigram Version: 1.0.0
Developed by Li Lab/Purcell Lab, Monash University