Prefix-Tree Decoding for Predicting Mass Spectra from Molecules
- URL: http://arxiv.org/abs/2303.06470v3
- Date: Sun, 3 Dec 2023 22:29:11 GMT
- Title: Prefix-Tree Decoding for Predicting Mass Spectra from Molecules
- Authors: Samuel Goldman, John Bradshaw, Jiayi Xin, and Connor W. Coley
- Abstract summary: We use a new intermediate strategy for predicting mass spectra from molecules by treating mass spectra as sets of molecular formulae, which are themselves multisets of atoms.
We show promising empirical results on mass spectra prediction tasks.
- Score: 12.868704267691125
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Computational predictions of mass spectra from molecules have enabled the
discovery of clinically relevant metabolites. However, such predictive tools
are still limited as they occupy one of two extremes, either operating (a) by
fragmenting molecules combinatorially with overly rigid constraints on
potential rearrangements and poor time complexity or (b) by decoding lossy and
nonphysical discretized spectra vectors. In this work, we use a new
intermediate strategy for predicting mass spectra from molecules by treating
mass spectra as sets of molecular formulae, which are themselves multisets of
atoms. After first encoding an input molecular graph, we decode a set of
molecular subformulae, each of which specify a predicted peak in the mass
spectrum, the intensities of which are predicted by a second model. Our key
insight is to overcome the combinatorial possibilities for molecular
subformulae by decoding the formula set using a prefix tree structure,
atom-type by atom-type, representing a general method for ordered multiset
decoding. We show promising empirical results on mass spectra prediction tasks.
Related papers
- Knowledge-aware contrastive heterogeneous molecular graph learning [77.94721384862699]
We propose a paradigm shift by encoding molecular graphs into Heterogeneous Molecular Graph Learning (KCHML)
KCHML conceptualizes molecules through three distinct graph views-molecular, elemental, and pharmacological-enhanced by heterogeneous molecular graphs and a dual message-passing mechanism.
This design offers a comprehensive representation for property prediction, as well as for downstream tasks such as drug-drug interaction (DDI) prediction.
arXiv Detail & Related papers (2025-02-17T11:53:58Z) - To Bin or not to Bin: Alternative Representations of Mass Spectra [0.0]
We investigate two alternatives to the binning of mass spectra before down-stream machine learning tasks, namely, set-based and graph-based representations.
Comparing the two proposed representations to train a set transformer and a graph neural network on a regression task, we show that they both perform substantially better than a multilayer perceptron trained on binned data.
arXiv Detail & Related papers (2025-02-15T16:52:36Z) - DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra [60.39311767532607]
DiffMS is a formula-restricted encoder-decoder generative network.
We develop a robust decoder that bridges latent embeddings and molecular structures.
Experiments show DiffMS outperforms existing models on $textitde novo$ molecule generation.
arXiv Detail & Related papers (2025-02-13T18:29:48Z) - Unraveling Molecular Structure: A Multimodal Spectroscopic Dataset for Chemistry [0.1747623282473278]
This dataset comprises simulated $1$H-NMR, $13$C-NMR, HSQC-NMR, Infrared, and Mass spectra for 790k molecules extracted from chemical reactions in patent data.
We provide benchmarks for evaluating single-modality tasks such as structure elucidation, predicting the spectra for a target molecule, and functional group predictions.
arXiv Detail & Related papers (2024-07-04T12:52:48Z) - Towards Predicting Equilibrium Distributions for Molecular Systems with
Deep Learning [60.02391969049972]
We introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems.
DiG employs deep neural networks to transform a simple distribution towards the equilibrium distribution, conditioned on a descriptor of a molecular system.
arXiv Detail & Related papers (2023-06-08T17:12:08Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Efficiently predicting high resolution mass spectra with graph neural
networks [28.387227518307604]
Identifying a small molecule from its mass spectrum is the primary open problem in computational metabolomics.
This is typically cast as information retrieval: an unknown spectrum is matched against spectra predicted computationally from a large database of chemical structures.
We resolve this tradeoff by casting spectrum prediction as a mapping from an input molecular graph to a probability distribution over molecular formulas.
arXiv Detail & Related papers (2023-01-26T21:10:26Z) - Ensemble Spectral Prediction (ESP) Model for Metabolite Annotation [10.640447979978436]
Key challenge in metabolomics is annotating measured spectra from a biological sample with chemical identities.
We propose a novel machine learning model, Ensemble Spectral Prediction (ESP), for metabolite annotation.
arXiv Detail & Related papers (2022-03-25T17:05:41Z) - Unsupervised Spectral Unmixing For Telluric Correction Using A Neural
Network Autoencoder [58.720142291102135]
We present a neural network autoencoder approach for extracting a telluric transmission spectrum from a large set of high-precision observed solar spectra from the HARPS-N radial velocity spectrograph.
arXiv Detail & Related papers (2021-11-17T12:54:48Z) - MassFormer: Tandem Mass Spectrum Prediction for Small Molecules using
Graph Transformers [3.2951121243459522]
Tandem mass spectra capture fragmentation patterns that provide key structural information about a molecule.
For over seventy years, spectrum prediction has remained a key challenge in the field.
We propose a new model, MassFormer, for accurately predicting tandem mass spectra.
arXiv Detail & Related papers (2021-11-08T20:55:15Z) - MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties.
We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.