Conversation
JingXuu
commented
Oct 10, 2025
- set receptor as motif, scaffold peptide
- change features: res_seq_pdb_idx(peptide+200) and chain_break_per_res
- add metrics calculation codes
There was a problem hiding this comment.
Pull Request Overview
This PR implements peptide design functionality for the proteina model by setting the receptor as a motif and designing scaffold peptides. The key modifications include chain-aware PDB writing, peptide-specific feature engineering, and comprehensive metrics calculation capabilities for evaluation.
Key changes:
- Enhanced PDB writing to support multi-chain structures with proper chain indexing
- Modified training pipeline to handle peptide-specific features including residue indexing with offsets and chain break markers
- Added comprehensive metrics calculation infrastructure including RMSD computation, structure diversity analysis, and designability assessment
Reviewed Changes
Copilot reviewed 29 out of 31 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| proteinfoundation/utils/ff_utils/pdb_utils.py | Enhanced PDB writing with chain support and motif-based structure organization |
| proteinfoundation/proteinflow/model_trainer_base.py | Added peptide features and validation RMSD tracking |
| proteinfoundation/motif_inference.py | Enhanced inference with PDB patching and RMSD calculation |
| test/cal_metrics/ | Complete metrics calculation suite including RMSD, diversity, and designability |
| configs/ | Configuration files for peptide training and inference |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| boundaries = torch.linspace( | ||
| 0.0, max_dist_boundary, num_dist_buckets - 1, device=pair_pred.device | ||
| 0.1, max_dist_boundary, num_dist_buckets - 1, device=pair_pred.device |
There was a problem hiding this comment.
Hardcoded value 0.1 for distance boundary start may cause issues with very short distances. Consider using 0.0 or making this configurable.
| n=n, shape=batch_shape, device=self.device, dtype=dtype, mask=mask | ||
| ) | ||
|
|
||
| batch['x_1'] = x_1 #Ensure that the unit used during training is nanometers (nm) not ang!!!!! |
There was a problem hiding this comment.
Grammatical error: 'ang' should be 'Å' or 'angstroms'.
| batch['x_1'] = x_1 #Ensure that the unit used during training is nanometers (nm) not ang!!!!! | |
| batch['x_1'] = x_1 # Ensure that the unit used during training is nanometers (nm), not angstroms (Å)! |
| if random.random() > 0.5 and self.cfg_exp.training.self_cond: | ||
| x_pred_sc, _ = self.predict_clean(batch) | ||
| batch["x_sc"] = self.detach_gradients(x_pred_sc) | ||
| #import pdb; pdb.set_trace() |
There was a problem hiding this comment.
Commented-out debug statement should be removed from production code.
| #import pdb; pdb.set_trace() |
| motif_min_length = cfg.motif_min_length, | ||
| motif_max_length = cfg.motif_max_length, | ||
| motif_task_name = cfg.motif_task_name) | ||
| #import pdb;pdb.set_trace() |
There was a problem hiding this comment.
Commented-out debug statement should be removed from production code.
| #import pdb;pdb.set_trace() |
| Returns: | ||
| List of sequences (strings) | ||
| """ | ||
| # import pdb;pdb.set_trace() |
There was a problem hiding this comment.
Commented-out debug statement should be removed from production code.
| # import pdb;pdb.set_trace() |
| # cal_metrics.py (only code & comments in English) | ||
|
|
There was a problem hiding this comment.
Orphaned comment that doesn't relate to surrounding code should be removed or clarified.
| # cal_metrics.py (only code & comments in English) |
| chain_index=np.concatenate([prot_rec.chain_index, prot_pep.chain_index], axis=0), | ||
| b_factors=np.concatenate([prot_rec.b_factors, prot_pep.b_factors], axis=0), | ||
| ) | ||
| pdb_prot = to_pdb(full_prot, model=1, add_end=True) # Now `add_end=True` for automatic ENDMDL and END |
There was a problem hiding this comment.
Comment mentions 'Now add_end=True' but doesn't explain why this change was made or when it should be used.
| pdb_prot = to_pdb(full_prot, model=1, add_end=True) # Now `add_end=True` for automatic ENDMDL and END | |
| # Use add_end=True to ensure ENDMDL and END records are included in the PDB output. | |
| # This is necessary when writing combined structures (e.g., motif and peptide as separate chains) | |
| # to properly terminate the PDB file for downstream compatibility. In other cases, set add_end=False | |
| # if termination records are not required. | |
| pdb_prot = to_pdb(full_prot, model=1, add_end=True) |
| pep_seqs = pep_seqss[1:] | ||
| print(pep_seqs) |
There was a problem hiding this comment.
Debug print statement should be removed or replaced with proper logging.
| gt_base = os.path.basename(gt_pdb_path) | ||
| m_case = re.search(r'([0-9][A-Za-z0-9]{3})', gt_base) | ||
| case_id = m_case.group(1) if m_case else os.path.splitext(gt_base)[0] | ||
| #import pdb;pdb.set_trace() |
There was a problem hiding this comment.
Another commented-out debug statement should be removed from production code.
| #import pdb;pdb.set_trace() |