ImmuneFold is an advanced approach for predicting immune protein structures using transfer learning, adapted from state-of-the-art protein structure prediction frameworks. ImmuneFold is specifically tailored for accurate modeling of immune proteins, including T-cell receptors (TCRs), antibodies, nanobodies, and their complexes with target antigens. By providing precise predictions of immune protein-antigen pairings, ImmuneFold offers valuable insights into protein interaction mechanisms, thereby supporting applications such as vaccine development and immune response analysis.
To install ImmuneFold, the recommended method is to create a conda environment and install the required dependencies:
git clone git@github.com:CarbonMatrixLab/immunefold.git
conda env create -f environment.yml
pip install fair-esmImmuneFold also need to install the ANARCI. Additionally, to compute the mutation effects on TCR-pMHC interactions, please install the PyRosetta package.
- Download the ImmuneFold-TCR and ImmuneFold-Ab here and place them in the
./paramsdirectory. - Download the ESM2 model weights from this link and the contact regressor weights from here. Save these files in the
./paramsdirectory.
To predict the structure of TCR-pMHC complexes, provide the TCR, peptide, and MHC sequences in a FASTA file, TCR_B_A_P_M.fasta, where B, A, P and M denote the beta, alpha, peptide, MHC chain ids, respectively. The format is as follows Beta_seq:Alpha_seq:Peptide_seq:MHC_seq:
>TCR_B_A_P_M
VSQHPSWVICKSGTSVKIECRSLDFQATTMFWYRQFPKQSLMLMATSNEGSKATYEQGVEKDKFLINHASLTLSTLTVTSAHPEDSSFYICSVSRDRNTGELFFGEGSRLTVL:VEQDPGPFNVPEGATVAFNCTYSNSASQSFFWYRQDCRKEPKLLMSVYSSGNEDGRFTAQLNRASQYISLLIRDSKLSDSATYLCVVNEEDALIFGKGTTLSVSS:YLQPRTFLL:GSHSMRYFFTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWIEQEGPEYWDGETRKVKAHSQTHRVDLGTLRGYYNQSEAGSHTVQRMYGCDVGSDWRFLRGYHQYAYDGKDYIALKEDLRSWTAADMAAQTTKHKWEAAHVAEQLRAYLEGTCVEWLRRYLENGKETLQ
To run the ImmuneFold-TCR structure prediction model, use the following command:
python inference.py --config-name=TCR_structure_predictionTo predict binding affinity using zero-shot learning, provide the whole TCR-pMHC sequences in a FASTA file, following the same format as described for structure prediction:
python inference.py --config-name=TCR_structure_prediction
python predict_energy.py --pdb_dir /path/to/pdb/dir --name_idx /path/to/name_idx.idx --output_file /path/to/energy.csv --mode interfaceFor antibody or nanobody structure prediction, provide the sequences in FASTA files, antibody_H_L.fasta or nanobody_H.fasta, where H and L represent the heavy and light chain ids, respectively. The formats are as follows:
Antibody:
>antibody_H_L
VSQHPSWVICKSGTSVKIECRSLDFQATTMFWYRQFPKQSLMLMATSNEGSKATYEQGVEKDKFLINHASLTLSTLTVTSAHPEDSSFYICSVSRDRNTGELFFGEGSRLTVL:VEQDPGPFNVPEGATVAFNCTYSNSASQSFFWYRQDCRKEPKLLMSVYSSGNEDGRFTAQLNRASQYISLLIRDSKLSDSATYLCVVNEEDALIFGKGTTLSVSS
Nanobody:
>nanobody_H
VSQHPSWVICKSGTSVKIECRSLDFQATTMFWYRQFPKQSLMLMATSNEGSKATYEQGVEKDKFLINHASLTLSTLTVTSAHPEDSSFYICSVSRDRNTGELFFGEGSRLTVL
To run the model:
python inference.py --config-name=antibody_structure_predictionor for nanobodies:
python inference.py --config-name=nanobody_structure_predictionFor predicting antibody or nanobody structures bound to a target antigen, provide the antigen structure as a PDB file antigen.pdb along with the antibody or nanobody sequences, and run the following command:
python inference.py --config-name=antibody_antigen_structure_predictionTo showcase the capabilities of ImmuneFold in large-scale TCR structure prediction, we applied the model to the VDJdb dataset, comprising 32,703 non-redundant TCR sequences. The predicted structures are available for download at link.