Prefix-Tree Decoding for Predicting Mass Spectra from Molecules
- URL: http://arxiv.org/abs/2303.06470v3
- Date: Sun, 3 Dec 2023 22:29:11 GMT
- Title: Prefix-Tree Decoding for Predicting Mass Spectra from Molecules
- Authors: Samuel Goldman, John Bradshaw, Jiayi Xin, and Connor W. Coley
- Abstract summary: We use a new intermediate strategy for predicting mass spectra from molecules by treating mass spectra as sets of molecular formulae, which are themselves multisets of atoms.
We show promising empirical results on mass spectra prediction tasks.
- Score: 12.868704267691125
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Computational predictions of mass spectra from molecules have enabled the
discovery of clinically relevant metabolites. However, such predictive tools
are still limited as they occupy one of two extremes, either operating (a) by
fragmenting molecules combinatorially with overly rigid constraints on
potential rearrangements and poor time complexity or (b) by decoding lossy and
nonphysical discretized spectra vectors. In this work, we use a new
intermediate strategy for predicting mass spectra from molecules by treating
mass spectra as sets of molecular formulae, which are themselves multisets of
atoms. After first encoding an input molecular graph, we decode a set of
molecular subformulae, each of which specify a predicted peak in the mass
spectrum, the intensities of which are predicted by a second model. Our key
insight is to overcome the combinatorial possibilities for molecular
subformulae by decoding the formula set using a prefix tree structure,
atom-type by atom-type, representing a general method for ordered multiset
decoding. We show promising empirical results on mass spectra prediction tasks.
Related papers
- Unraveling Molecular Structure: A Multimodal Spectroscopic Dataset for Chemistry [0.1747623282473278]
This dataset comprises simulated $1$H-NMR, $13$C-NMR, HSQC-NMR, Infrared, and Mass spectra for 790k molecules extracted from chemical reactions in patent data.
We provide benchmarks for evaluating single-modality tasks such as structure elucidation, predicting the spectra for a target molecule, and functional group predictions.
arXiv Detail & Related papers (2024-07-04T12:52:48Z) - Towards Predicting Equilibrium Distributions for Molecular Systems with
Deep Learning [60.02391969049972]
We introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems.
DiG employs deep neural networks to transform a simple distribution towards the equilibrium distribution, conditioned on a descriptor of a molecular system.
arXiv Detail & Related papers (2023-06-08T17:12:08Z) - Atomic and Subgraph-aware Bilateral Aggregation for Molecular
Representation Learning [57.670845619155195]
We introduce a new model for molecular representation learning called the Atomic and Subgraph-aware Bilateral Aggregation (ASBA)
ASBA addresses the limitations of previous atom-wise and subgraph-wise models by incorporating both types of information.
Our method offers a more comprehensive way to learn representations for molecular property prediction and has broad potential in drug and material discovery applications.
arXiv Detail & Related papers (2023-05-22T00:56:00Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Efficiently predicting high resolution mass spectra with graph neural
networks [28.387227518307604]
Identifying a small molecule from its mass spectrum is the primary open problem in computational metabolomics.
This is typically cast as information retrieval: an unknown spectrum is matched against spectra predicted computationally from a large database of chemical structures.
We resolve this tradeoff by casting spectrum prediction as a mapping from an input molecular graph to a probability distribution over molecular formulas.
arXiv Detail & Related papers (2023-01-26T21:10:26Z) - Ensemble Spectral Prediction (ESP) Model for Metabolite Annotation [10.640447979978436]
Key challenge in metabolomics is annotating measured spectra from a biological sample with chemical identities.
We propose a novel machine learning model, Ensemble Spectral Prediction (ESP), for metabolite annotation.
arXiv Detail & Related papers (2022-03-25T17:05:41Z) - Unsupervised Spectral Unmixing For Telluric Correction Using A Neural
Network Autoencoder [58.720142291102135]
We present a neural network autoencoder approach for extracting a telluric transmission spectrum from a large set of high-precision observed solar spectra from the HARPS-N radial velocity spectrograph.
arXiv Detail & Related papers (2021-11-17T12:54:48Z) - MassFormer: Tandem Mass Spectrum Prediction for Small Molecules using
Graph Transformers [3.2951121243459522]
Tandem mass spectra capture fragmentation patterns that provide key structural information about a molecule.
For over seventy years, spectrum prediction has remained a key challenge in the field.
We propose a new model, MassFormer, for accurately predicting tandem mass spectra.
arXiv Detail & Related papers (2021-11-08T20:55:15Z) - Flexible dual-branched message passing neural network for quantum
mechanical property prediction with molecular conformation [16.08677447593939]
We propose a dual-branched neural network for molecular property prediction based on message-passing framework.
Our model learns heterogeneous molecular features with different scales, which are trained flexibly according to each prediction target.
arXiv Detail & Related papers (2021-06-14T10:00:39Z) - Advanced Graph and Sequence Neural Networks for Molecular Property
Prediction and Drug Discovery [53.00288162642151]
We develop MoleculeKit, a suite of comprehensive machine learning tools spanning different computational models and molecular representations.
Built on these representations, MoleculeKit includes both deep learning and traditional machine learning methods for graph and sequence data.
Results on both online and offline antibiotics discovery and molecular property prediction tasks show that MoleculeKit achieves consistent improvements over prior methods.
arXiv Detail & Related papers (2020-12-02T02:09:31Z) - MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization [51.00815310242277]
generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties.
We propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution.
arXiv Detail & Related papers (2020-10-05T20:18:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.