MassFormer: Tandem Mass Spectrum Prediction for Small Molecules using
Graph Transformers
- URL: http://arxiv.org/abs/2111.04824v3
- Date: Mon, 1 May 2023 19:19:58 GMT
- Title: MassFormer: Tandem Mass Spectrum Prediction for Small Molecules using
Graph Transformers
- Authors: Adamo Young, Bo Wang, Hannes R\"ost
- Abstract summary: Tandem mass spectra capture fragmentation patterns that provide key structural information about a molecule.
For over seventy years, spectrum prediction has remained a key challenge in the field.
We propose a new model, MassFormer, for accurately predicting tandem mass spectra.
- Score: 3.2951121243459522
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tandem mass spectra capture fragmentation patterns that provide key
structural information about a molecule. Although mass spectrometry is applied
in many areas, the vast majority of small molecules lack experimental reference
spectra. For over seventy years, spectrum prediction has remained a key
challenge in the field. Existing deep learning methods do not leverage global
structure in the molecule, potentially resulting in difficulties when
generalizing to new data. In this work we propose a new model, MassFormer, for
accurately predicting tandem mass spectra. MassFormer uses a graph transformer
architecture to model long-distance relationships between atoms in the
molecule. The transformer module is initialized with parameters obtained
through a chemical pre-training task, then fine-tuned on spectral data.
MassFormer outperforms competing approaches for spectrum prediction on multiple
datasets, and is able to recover prior knowledge about the effect of collision
energy on the spectrum. By employing gradient-based attribution methods, we
demonstrate that the model can identify relationships between fragment peaks.
To further highlight MassFormer's utility, we show that it can match or exceed
existing prediction-based methods on two spectrum identification tasks. We
provide open-source implementations of our model and baseline approaches, with
the goal of encouraging future research in this area.
Related papers
- Machine learning meets mass spectrometry: a focused perspective [0.0]
Mass spectrometry is a widely used method to study molecules and processes in medicine, life sciences, chemistry, and industrial product quality control, among many other applications.
One of the main features of some mass spectrometry techniques is the extensive level of characterization and a large amount of generated data per measurement.
With the development of machine learning methods, the opportunity arises to unlock the potential of these data, enabling previously inaccessible discoveries.
arXiv Detail & Related papers (2024-06-27T14:18:23Z) - Mass Spectra Prediction with Structural Motif-based Graph Neural
Networks [21.71309513265843]
MoMS-Net is a system that predicts mass spectra using the information derived from structural motifs and the implementation of Graph Neural Networks (GNNs)
We have tested our model across diverse mass spectra and have observed its superiority over other existing models.
arXiv Detail & Related papers (2023-06-28T10:33:57Z) - Towards Predicting Equilibrium Distributions for Molecular Systems with
Deep Learning [60.02391969049972]
We introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems.
DiG employs deep neural networks to transform a simple distribution towards the equilibrium distribution, conditioned on a descriptor of a molecular system.
arXiv Detail & Related papers (2023-06-08T17:12:08Z) - Prefix-Tree Decoding for Predicting Mass Spectra from Molecules [12.868704267691125]
We use a new intermediate strategy for predicting mass spectra from molecules by treating mass spectra as sets of molecular formulae, which are themselves multisets of atoms.
We show promising empirical results on mass spectra prediction tasks.
arXiv Detail & Related papers (2023-03-11T17:44:28Z) - Multiresolution Graph Transformers and Wavelet Positional Encoding for
Learning Hierarchical Structures [6.875312133832078]
We propose Multiresolution Graph Transformers (MGT), the first graph transformer architecture that can learn to represent large molecules at multiple scales.
MGT can learn to produce representations for the atoms and group them into meaningful functional groups or repeating units.
Our proposed model achieves results on two macromolecule datasets consisting of polymers and peptides, and one drug-like molecule dataset.
arXiv Detail & Related papers (2023-02-17T01:32:44Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Efficiently predicting high resolution mass spectra with graph neural
networks [28.387227518307604]
Identifying a small molecule from its mass spectrum is the primary open problem in computational metabolomics.
This is typically cast as information retrieval: an unknown spectrum is matched against spectra predicted computationally from a large database of chemical structures.
We resolve this tradeoff by casting spectrum prediction as a mapping from an input molecular graph to a probability distribution over molecular formulas.
arXiv Detail & Related papers (2023-01-26T21:10:26Z) - Unsupervised Spectral Unmixing For Telluric Correction Using A Neural
Network Autoencoder [58.720142291102135]
We present a neural network autoencoder approach for extracting a telluric transmission spectrum from a large set of high-precision observed solar spectra from the HARPS-N radial velocity spectrograph.
arXiv Detail & Related papers (2021-11-17T12:54:48Z) - MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties.
We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z) - ASGN: An Active Semi-supervised Graph Neural Network for Molecular
Property Prediction [61.33144688400446]
We propose a novel framework called Active Semi-supervised Graph Neural Network (ASGN) by incorporating both labeled and unlabeled molecules.
In the teacher model, we propose a novel semi-supervised learning method to learn general representation that jointly exploits information from molecular structure and molecular distribution.
At last, we proposed a novel active learning strategy in terms of molecular diversities to select informative data during the whole framework learning.
arXiv Detail & Related papers (2020-07-07T04:22:39Z) - Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning.
GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data.
We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.