Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry
- URL: http://arxiv.org/abs/2402.11363v3
- Date: Wed, 26 Jun 2024 07:45:33 GMT
- Title: Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry
- Authors: Shiva Ebrahimi, Xuan Guo,
- Abstract summary: We introduce DiaTrans, a deep-learning model based on transformer architecture.
It deciphers peptide sequences from DIA mass spectrometry data.
Our results show significant improvements over existing STOA methods.
- Score: 1.338778493151964
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tandem mass spectrometry (MS/MS) stands as the predominant high-throughput technique for comprehensively analyzing protein content within biological samples. This methodology is a cornerstone driving the advancement of proteomics. In recent years, substantial strides have been made in Data-Independent Acquisition (DIA) strategies, facilitating impartial and non-targeted fragmentation of precursor ions. The DIA-generated MS/MS spectra present a formidable obstacle due to their inherent high multiplexing nature. Each spectrum encapsulates fragmented product ions originating from multiple precursor peptides. This intricacy poses a particularly acute challenge in de novo peptide/protein sequencing, where current methods are ill-equipped to address the multiplexing conundrum. In this paper, we introduce DiaTrans, a deep-learning model based on transformer architecture. It deciphers peptide sequences from DIA mass spectrometry data. Our results show significant improvements over existing STOA methods, including DeepNovo-DIA and PepNet. Casanovo-DIA enhances precision by 15.14% to 34.8%, recall by 11.62% to 31.94% at the amino acid level, and boosts precision by 59% to 81.36% at the peptide level. Integrating DIA data and our DiaTrans model holds considerable promise to uncover novel peptides and more comprehensive profiling of biological samples. Casanovo-DIA is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/DiaTrans.
Related papers
- Peptide-GPT: Generative Design of Peptides using Generative Pre-trained Transformers and Bio-informatic Supervision [7.275932354889042]
We introduce a protein language model tailored to generate protein sequences with distinct properties.
We rank the generated sequences based on their perplexity scores, then we filter out those lying outside the permissible convex hull of proteins.
We achieved an accuracy of 76.26% in hemolytic, 72.46% in non-hemolytic, 78.84% in non-fouling, and 68.06% in solubility protein generation.
arXiv Detail & Related papers (2024-10-25T00:15:39Z) - NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics [58.03989832372747]
We present the first unified benchmark NovoBench for emphde novo peptide sequencing.
It comprises diverse mass spectrum data, integrated models, and comprehensive evaluation metrics.
Recent methods, including DeepNovo, PointNovo, Casanovo, InstaNovo, AdaNovo and $pi$-HelixNovo are integrated into our framework.
arXiv Detail & Related papers (2024-06-16T08:23:21Z) - AdaNovo: Adaptive \emph{De Novo} Peptide Sequencing with Conditional Mutual Information [46.23980841020632]
We propose AdaNovo, a novel framework that calculates conditional mutual information (CMI) between the spectrum and each amino acid/peptide.
AdaNovo excels in identifying amino acids with post-translational modifications (PTMs) and exhibits robustness against data noise.
arXiv Detail & Related papers (2024-03-09T11:54:58Z) - ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide
Sequencing [70.12220342151113]
ContraNovo is a pioneering algorithm that leverages contrastive learning to extract the relationship between spectra and peptides.
ContraNovo consistently outshines contemporary state-of-the-art solutions.
arXiv Detail & Related papers (2023-12-18T12:49:46Z) - Efficiently Predicting Protein Stability Changes Upon Single-point
Mutation with Large Language Models [51.57843608615827]
The ability to precisely predict protein thermostability is pivotal for various subfields and applications in biochemistry.
We introduce an ESM-assisted efficient approach that integrates protein sequence and structural features to predict the thermostability changes in protein upon single-point mutations.
arXiv Detail & Related papers (2023-12-07T03:25:49Z) - Efficient Prediction of Peptide Self-assembly through Sequential and
Graphical Encoding [57.89530563948755]
This work provides a benchmark analysis of peptide encoding with advanced deep learning models.
It serves as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.
arXiv Detail & Related papers (2023-07-17T00:43:33Z) - Fast and Functional Structured Data Generators Rooted in
Out-of-Equilibrium Physics [62.997667081978825]
We address the challenge of using energy-based models to produce high-quality, label-specific data in structured datasets.
Traditional training methods encounter difficulties due to inefficient Markov chain Monte Carlo mixing.
We use a novel training algorithm that exploits non-equilibrium effects.
arXiv Detail & Related papers (2023-07-13T15:08:44Z) - Target specific peptide design using latent space approximate trajectory
collector [1.7289819674602296]
We propose a novel machine based machine learning design, called Approximate Space Tray Collector (LSATC)
It consists of a series of samplers that approximates peptides with desired binding properties in a space $284%$.
We latent by the design of peptide extensions targeting Beta-catenin, a key effector protein involved in canonical Wnt signalling.
All the four binding peptides extended by LSATC show improved Beta-catenin binding by at least $2%$, two of the peptides show a $3$ fold increase in affinity as compared to the base peptide.
arXiv Detail & Related papers (2023-02-02T21:56:52Z) - DePS: An improved deep learning model for de novo peptide sequencing [7.468176246958974]
In this study, we proposed an enhanced model, DePS, which can improve the accuracy of de novo peptide sequencing.
For the same test set of DeepNovoV2, the DePS model achieved excellent results of 74.22%, 74.21% and 41.68% for amino acid recall, amino acid precision and peptide recall respectively.
arXiv Detail & Related papers (2022-03-16T16:45:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.