Related papers: MassFormer: Tandem Mass Spectrum Prediction for Small Molecules using Graph Transformers

MassFormer: Tandem Mass Spectrum Prediction for Small Molecules using Graph Transformers

URL: http://arxiv.org/abs/2111.04824v3
Date: Mon, 1 May 2023 19:19:58 GMT
Title: MassFormer: Tandem Mass Spectrum Prediction for Small Molecules using Graph Transformers
Authors: Adamo Young, Bo Wang, Hannes R\"ost
Abstract summary: Tandem mass spectra capture fragmentation patterns that provide key structural information about a molecule. For over seventy years, spectrum prediction has remained a key challenge in the field. We propose a new model, MassFormer, for accurately predicting tandem mass spectra.
Score: 3.2951121243459522
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Tandem mass spectra capture fragmentation patterns that provide key structural information about a molecule. Although mass spectrometry is applied in many areas, the vast majority of small molecules lack experimental reference spectra. For over seventy years, spectrum prediction has remained a key challenge in the field. Existing deep learning methods do not leverage global structure in the molecule, potentially resulting in difficulties when generalizing to new data. In this work we propose a new model, MassFormer, for accurately predicting tandem mass spectra. MassFormer uses a graph transformer architecture to model long-distance relationships between atoms in the molecule. The transformer module is initialized with parameters obtained through a chemical pre-training task, then fine-tuned on spectral data. MassFormer outperforms competing approaches for spectrum prediction on multiple datasets, and is able to recover prior knowledge about the effect of collision energy on the spectrum. By employing gradient-based attribution methods, we demonstrate that the model can identify relationships between fragment peaks. To further highlight MassFormer's utility, we show that it can match or exceed existing prediction-based methods on two spectrum identification tasks. We provide open-source implementations of our model and baseline approaches, with the goal of encouraging future research in this area.

Related papers

DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models [66.41802970528133]
Molecular structure elucidation from spectra is a foundational problem in chemistry.<n>Traditional methods rely heavily on expert interpretation and lack scalability.<n>We present DiffSpectra, a generative framework that directly infers both 2D and 3D molecular structures from multi-modal spectral data.
arXiv Detail & Related papers (2025-07-09T13:57:20Z)
MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra [48.52871465095181]
We propose to utilize the energy spectra to enhance the pre-training of 3D molecular representations (MolSpectra) Specifically, we propose SpecFormer, a multi-spectrum encoder for encoding molecular spectra via masked patch reconstruction. By further aligning outputs from the 3D encoder and spectrum encoder using a contrastive objective, we enhance the 3D encoder's understanding of molecules.
arXiv Detail & Related papers (2025-02-22T16:34:32Z)
To Bin or not to Bin: Alternative Representations of Mass Spectra [0.0]
We investigate two alternatives to the binning of mass spectra before down-stream machine learning tasks, namely, set-based and graph-based representations. Comparing the two proposed representations to train a set transformer and a graph neural network on a regression task, we show that they both perform substantially better than a multilayer perceptron trained on binned data.
arXiv Detail & Related papers (2025-02-15T16:52:36Z)
DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra [60.39311767532607]
DiffMS is a formula-restricted encoder-decoder generative network. We develop a robust decoder that bridges latent embeddings and molecular structures. Experiments show DiffMS outperforms existing models on $textitde novo$ molecule generation.
arXiv Detail & Related papers (2025-02-13T18:29:48Z)
Machine learning meets mass spectrometry: a focused perspective [0.0]
Mass spectrometry is a widely used method to study molecules and processes in medicine, life sciences, chemistry, and industrial product quality control, among many other applications. One of the main features of some mass spectrometry techniques is the extensive level of characterization and a large amount of generated data per measurement. With the development of machine learning methods, the opportunity arises to unlock the potential of these data, enabling previously inaccessible discoveries.
arXiv Detail & Related papers (2024-06-27T14:18:23Z)
Mass Spectra Prediction with Structural Motif-based Graph Neural Networks [21.71309513265843]
MoMS-Net is a system that predicts mass spectra using the information derived from structural motifs and the implementation of Graph Neural Networks (GNNs) We have tested our model across diverse mass spectra and have observed its superiority over other existing models.
arXiv Detail & Related papers (2023-06-28T10:33:57Z)
Towards Predicting Equilibrium Distributions for Molecular Systems with Deep Learning [60.02391969049972]
We introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems. DiG employs deep neural networks to transform a simple distribution towards the equilibrium distribution, conditioned on a descriptor of a molecular system.
arXiv Detail & Related papers (2023-06-08T17:12:08Z)
Prefix-Tree Decoding for Predicting Mass Spectra from Molecules [12.868704267691125]
We use a new intermediate strategy for predicting mass spectra from molecules by treating mass spectra as sets of molecular formulae, which are themselves multisets of atoms. We show promising empirical results on mass spectra prediction tasks.
arXiv Detail & Related papers (2023-03-11T17:44:28Z)
Multiresolution Graph Transformers and Wavelet Positional Encoding for Learning Hierarchical Structures [6.875312133832078]
We propose Multiresolution Graph Transformers (MGT), the first graph transformer architecture that can learn to represent large molecules at multiple scales. MGT can learn to produce representations for the atoms and group them into meaningful functional groups or repeating units. Our proposed model achieves results on two macromolecule datasets consisting of polymers and peptides, and one drug-like molecule dataset.
arXiv Detail & Related papers (2023-02-17T01:32:44Z)
Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction. Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations. On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z)
Efficiently predicting high resolution mass spectra with graph neural networks [28.387227518307604]
Identifying a small molecule from its mass spectrum is the primary open problem in computational metabolomics. This is typically cast as information retrieval: an unknown spectrum is matched against spectra predicted computationally from a large database of chemical structures. We resolve this tradeoff by casting spectrum prediction as a mapping from an input molecular graph to a probability distribution over molecular formulas.
arXiv Detail & Related papers (2023-01-26T21:10:26Z)
Unsupervised Spectral Unmixing For Telluric Correction Using A Neural Network Autoencoder [58.720142291102135]
We present a neural network autoencoder approach for extracting a telluric transmission spectrum from a large set of high-precision observed solar spectra from the HARPS-N radial velocity spectrograph.
arXiv Detail & Related papers (2021-11-17T12:54:48Z)
MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties. We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z)
ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property Prediction [61.33144688400446]
We propose a novel framework called Active Semi-supervised Graph Neural Network (ASGN) by incorporating both labeled and unlabeled molecules. In the teacher model, we propose a novel semi-supervised learning method to learn general representation that jointly exploits information from molecular structure and molecular distribution. At last, we proposed a novel active learning strategy in terms of molecular diversities to select informative data during the whole framework learning.
arXiv Detail & Related papers (2020-07-07T04:22:39Z)
Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning. GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data. We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.