Regressor-guided Diffusion Model for De Novo Peptide Sequencing with Explicit Mass Control
- URL: http://arxiv.org/abs/2602.20209v1
- Date: Mon, 23 Feb 2026 03:26:25 GMT
- Title: Regressor-guided Diffusion Model for De Novo Peptide Sequencing with Explicit Mass Control
- Authors: Shaorong Chen, Jingbo Zhou, Jun Xia,
- Abstract summary: We introduce DiffuNovo, a regressor-guided diffusion model for de novo peptide sequencing that provides explicit peptide-level mass control.<n>Our approach integrates the mass constraint at two critical stages: during training, a novel peptide-level mass loss guides model optimization, while at inference, regressor-based guidance from gradient-based updates in the latent space steers the generation to compel the predicted peptide adheres to the mass constraint.
- Score: 21.55210993203977
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The discovery of novel proteins relies on sensitive protein identification, for which de novo peptide sequencing (DNPS) from mass spectra is a crucial approach. While deep learning has advanced DNPS, existing models inadequately enforce the fundamental mass consistency constraint, that a predicted peptide's mass must match the experimental measured precursor mass. Previous DNPS methods often treat this critical information as a simple input feature or use it in post-processing, leading to numerous implausible predictions that do not adhere to this fundamental physical property. To address this limitation, we introduce DiffuNovo, a novel regressor-guided diffusion model for de novo peptide sequencing that provides explicit peptide-level mass control. Our approach integrates the mass constraint at two critical stages: during training, a novel peptide-level mass loss guides model optimization, while at inference, regressor-based guidance from gradient-based updates in the latent space steers the generation to compel the predicted peptide adheres to the mass constraint. Comprehensive evaluations on established benchmarks demonstrate that DiffuNovo surpasses state-of-the-art methods in DNPS accuracy. Additionally, as the first DNPS model to employ a diffusion model as its core backbone, DiffuNovo leverages the powerful controllability of diffusion architecture and achieves a significant reduction in mass error, thereby producing much more physically plausible peptides. These innovations represent a substantial advancement toward robust and broadly applicable DNPS. The source code is available in the supplementary material.
Related papers
- PepEDiff: Zero-Shot Peptide Binder Design via Protein Embedding Diffusion [3.9876702935151225]
We present PepEDiff, a novel peptide binder generator that designs binding sequences given a target receptor protein sequence and its pocket residues.<n>Our approach departs from existing methods by generating binder sequences directly in a continuous latent space derived from a pretrained protein embedding model.<n>Despite its simplicity, our method outperforms state-of-the-art approaches across benchmark tests and in the TIGIT case study.
arXiv Detail & Related papers (2026-01-19T19:07:32Z) - Adaptive Multimodal Protein Plug-and-Play with Diffusion-Based Priors [5.809784853115825]
In an inverse problem, the goal is to recover an unknown parameter that has typically undergone some lossy or noisy transformation during measurement.<n>Recently, deep generative models, particularly diffusion models, have emerged as powerful priors for protein structure generation.<n>We introduce Adam-, a Plug-and-Play framework that guides a pre-trained protein diffusion model using gradients from multiple, heterogeneous experimental sources.
arXiv Detail & Related papers (2025-07-28T18:28:03Z) - Training-Free Stein Diffusion Guidance: Posterior Correction for Sampling Beyond High-Density Regions [46.59494117137471]
Training free diffusion guidance provides a flexible way to leverage off-the-shelf classifiers without additional training.<n>We introduce Stein Diffusion Guidance (SDG), a novel training-free framework grounded in a surrogate SOC objective.<n>Experiments on molecular low-density sampling tasks suggest that SDG consistently surpasses standard training-free guidance methods.
arXiv Detail & Related papers (2025-07-07T21:14:27Z) - Reimagining Target-Aware Molecular Generation through Retrieval-Enhanced Aligned Diffusion [22.204642926984526]
READ is introduced, which is the first to merge molecular Retrieval-Augmented Generation with an SE(3)-equivariant diffusion model.<n>It can achieve very competitive performance in CBGBench, surpassing state-of-the-art generative models and even native scaffolds.
arXiv Detail & Related papers (2025-06-17T13:09:11Z) - Universal Biological Sequence Reranking for Improved De Novo Peptide Sequencing [32.29218860420551]
RankNovo is the first deep reranking framework that enhances de novo peptide sequencing.<n>Our work presents a novel reranking strategy that challenges existing single-model paradigms and advances the frontier of accurate de novo sequencing.
arXiv Detail & Related papers (2025-05-23T06:56:55Z) - Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference.
Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable.
We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z) - NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics [58.03989832372747]
We present the first unified benchmark NovoBench for emphde novo peptide sequencing.
It comprises diverse mass spectrum data, integrated models, and comprehensive evaluation metrics.
Recent methods, including DeepNovo, PointNovo, Casanovo, InstaNovo, AdaNovo and $pi$-HelixNovo are integrated into our framework.
arXiv Detail & Related papers (2024-06-16T08:23:21Z) - ECC-PolypDet: Enhanced CenterNet with Contrastive Learning for Automatic
Polyp Detection [88.4359020192429]
Existing methods either involve computationally expensive context aggregation or lack prior modeling of polyps, resulting in poor performance in challenging cases.
In this paper, we propose the Enhanced CenterNet with Contrastive Learning (ECC-PolypDet), a two-stage training & end-to-end inference framework.
Box-assisted Contrastive Learning (BCL) during training to minimize the intra-class difference and maximize the inter-class difference between foreground polyps and backgrounds, enabling our model to capture concealed polyps.
In the fine-tuning stage, we introduce the IoU-guided Sample Re-weighting
arXiv Detail & Related papers (2024-01-10T07:03:41Z) - Efficient Prediction of Peptide Self-assembly through Sequential and
Graphical Encoding [57.89530563948755]
This work provides a benchmark analysis of peptide encoding with advanced deep learning models.
It serves as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.
arXiv Detail & Related papers (2023-07-17T00:43:33Z) - EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based
Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network.
Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.