NS-Pep: De novo Peptide Design with Non-Standard Amino Acids
- URL: http://arxiv.org/abs/2510.03326v1
- Date: Wed, 01 Oct 2025 16:04:06 GMT
- Title: NS-Pep: De novo Peptide Design with Non-Standard Amino Acids
- Authors: Tao Guo, Junbo Yin, Yu Wang, Xin Gao,
- Abstract summary: Non-standard amino acids (NSAAs) offer improved binding affinity and improved pharmacological properties.<n>Existing peptide design methods are limited to standard amino acids, leaving NSAA-aware design largely unexplored.<n>We introduce NS-Pep, a unified framework for co-designing peptide sequences and structures with NSAAs.
- Score: 15.931688895952234
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Peptide drugs incorporating non-standard amino acids (NSAAs) offer improved binding affinity and improved pharmacological properties. However, existing peptide design methods are limited to standard amino acids, leaving NSAA-aware design largely unexplored. We introduce NS-Pep, a unified framework for co-designing peptide sequences and structures with NSAAs. The main challenge is that NSAAs are extremely underrepresented-even the most frequent one, SEP, accounts for less than 0.4% of residues-resulting in a severe long-tailed distribution. To improve generalization to rare amino acids, we propose Residue Frequency-Guided Modification (RFGM), which mitigates over-penalization through frequency-aware logit calibration, supported by both theoretical and empirical analysis. Furthermore, we identify that insufficient side-chain modeling limits geometric representation of NSAAs. To address this, we introduce Progressive Side-chain Perception (PSP) for coarse-to-fine torsion and location prediction, and Interaction-Aware Weighting (IAW) to emphasize pocket-proximal residues. Moreover, NS-Pep generalizes naturally to the peptide folding task with NSAAs, addressing a major limitation of current tools. Experiments show that NS-Pep improves sequence recovery rate and binding affinity by 6.23% and 5.12%, respectively, and outperforms AlphaFold3 by 17.76% in peptide folding success rate.
Related papers
- Lightweight MSA Design Advances Protein Folding From Evolutionary Embeddings [51.731441632457226]
Multiple sequence alignments (MSAs) underperform on low-homology and orphan proteins.<n>We introduce PLAME, a lightweight MSA design framework that generates MSAs that better support downstream folding.<n>On AlphaFold2 low-homology/orphan benchmarks, PLAME delivers state-of-the-art improvements in structure accuracy.
arXiv Detail & Related papers (2025-06-17T04:11:30Z) - NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics [58.03989832372747]
We present the first unified benchmark NovoBench for emphde novo peptide sequencing.
It comprises diverse mass spectrum data, integrated models, and comprehensive evaluation metrics.
Recent methods, including DeepNovo, PointNovo, Casanovo, InstaNovo, AdaNovo and $pi$-HelixNovo are integrated into our framework.
arXiv Detail & Related papers (2024-06-16T08:23:21Z) - AdaNovo: Adaptive \emph{De Novo} Peptide Sequencing with Conditional Mutual Information [46.23980841020632]
We propose AdaNovo, a novel framework that calculates conditional mutual information (CMI) between the spectrum and each amino acid/peptide.
AdaNovo excels in identifying amino acids with post-translational modifications (PTMs) and exhibits robustness against data noise.
arXiv Detail & Related papers (2024-03-09T11:54:58Z) - PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for
Efficient and Generalizable Compound-Protein Interaction Prediction [63.50967073653953]
Compound-Protein Interaction prediction aims to predict the pattern and strength of compound-protein interactions for rational drug discovery.
Existing deep learning-based methods utilize only the single modality of protein sequences or structures.
We propose a novel multi-scale Protein Sequence-structure Contrasting framework for CPI prediction.
arXiv Detail & Related papers (2024-02-13T03:51:10Z) - GPCR-BERT: Interpreting Sequential Design of G Protein Coupled Receptors
Using Protein Language Models [5.812284760539713]
We developed the GPCR-BERT model for understanding the sequential design of G Protein-Coupled Receptors (GPCRs)
GPCRs are the target of over one-third of FDA-approved pharmaceuticals.
We were able to shed light on several relationships between residues in the binding pocket and some of the conserved motifs.
arXiv Detail & Related papers (2023-10-30T18:28:50Z) - Predicting protein stability changes under multiple amino acid
substitutions using equivariant graph neural networks [2.5137859989323537]
We propose improvements to state-of-the-art Deep learning (DL) protein stability prediction models.
This was achieved using E(3)-equivariant graph neural networks (EGNNs) for both atomic environment (AE) embedding and residue-level scoring tasks.
We demonstrate the immediately promising results of this procedure, discuss the current shortcomings, and highlight potential future strategies.
arXiv Detail & Related papers (2023-05-30T14:48:06Z) - xTrimoABFold: De novo Antibody Structure Prediction without MSA [77.47606749555686]
We develop a novel model named xTrimoABFold to predict antibody structure from antibody sequence.
The model was trained end-to-end on the antibody structures in PDB by minimizing the ensemble loss of domain-specific focal loss on CDR and the frame-aligned point loss.
arXiv Detail & Related papers (2022-11-30T09:26:08Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - DPST: De Novo Peptide Sequencing with Amino-Acid-Aware Transformers [11.527280359634524]
De novo peptide sequencing aims to recover amino acid sequences of a peptide from tandem mass spectrometry (MS) data.
Existing approaches for de novo analysis enumerate MS evidence for all amino acid classes during inference.
Our approach, DPST, circumvents these limitations with two key components.
arXiv Detail & Related papers (2022-03-23T08:01:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.