Accurate and efficient structure elucidation from routine one-dimensional NMR spectra using multitask machine learning
- URL: http://arxiv.org/abs/2408.08284v1
- Date: Thu, 15 Aug 2024 17:37:36 GMT
- Title: Accurate and efficient structure elucidation from routine one-dimensional NMR spectra using multitask machine learning
- Authors: Frank Hu, Michael S. Chen, Grant M. Rotskoff, Matthew W. Kanan, Thomas E. Markland,
- Abstract summary: We introduce a machine learning framework that predicts the molecular structure of an unknown compound based on its 1D 1H and/or 13C NMR spectra.
Integrating this capability with a convolutional neural network (CNN), we build an end-to-end model for predicting structure from spectra that is fast and accurate.
- Score: 1.2754578699685275
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Rapid determination of molecular structures can greatly accelerate workflows across many chemical disciplines. However, elucidating structure using only one-dimensional (1D) NMR spectra, the most readily accessible data, remains an extremely challenging problem because of the combinatorial explosion of the number of possible molecules as the number of constituent atoms is increased. Here, we introduce a multitask machine learning framework that predicts the molecular structure (formula and connectivity) of an unknown compound solely based on its 1D 1H and/or 13C NMR spectra. First, we show how a transformer architecture can be constructed to efficiently solve the task, traditionally performed by chemists, of assembling large numbers of molecular fragments into molecular structures. Integrating this capability with a convolutional neural network (CNN), we build an end-to-end model for predicting structure from spectra that is fast and accurate. We demonstrate the effectiveness of this framework on molecules with up to 19 heavy (non-hydrogen) atoms, a size for which there are trillions of possible structures. Without relying on any prior chemical knowledge such as the molecular formula, we show that our approach predicts the exact molecule 69.6% of the time within the first 15 predictions, reducing the search space by up to 11 orders of magnitude.
Related papers
- GraphXForm: Graph transformer for computer-aided molecular design with application to extraction [73.1842164721868]
We present GraphXForm, a decoder-only graph transformer architecture, which is pretrained on existing compounds and then fine-tuned.
We evaluate it on two solvent design tasks for liquid-liquid extraction, showing that it outperforms four state-of-the-art molecular design techniques.
arXiv Detail & Related papers (2024-11-03T19:45:15Z) - A Transformer Based Generative Chemical Language AI Model for Structural Elucidation of Organic Compounds [1.5628118690186594]
We present a proof-of-concept transformer based generative chemical language artificial intelligence (AI) model.
Our model employs an encoder-decoder architecture and self-attention mechanisms to directly generate the most probable chemical structures.
It performs structural elucidation of molecules with up to 29 atoms in just a few seconds on a modern CPU, achieving a top-15 accuracy of 83%.
arXiv Detail & Related papers (2024-10-13T15:41:20Z) - Carbohydrate NMR chemical shift predictions using E(3) equivariant graph
neural networks [0.0]
This work introduces a novel approach that leverages E(3) equivariant graph neural networks to predict carbohydrate NMR spectra.
Notably, our model achieves a substantial reduction in mean absolute error, up to threefold, compared to traditional models.
The implications are far-reaching and go beyond an advanced understanding of carbohydrate structures and spectral interpretation.
arXiv Detail & Related papers (2023-11-21T15:01:14Z) - QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules [69.25826391912368]
We generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 999 or 2998 molecular dynamics trajectories.
We show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules.
arXiv Detail & Related papers (2023-06-15T23:39:07Z) - Prefix-Tree Decoding for Predicting Mass Spectra from Molecules [12.868704267691125]
We use a new intermediate strategy for predicting mass spectra from molecules by treating mass spectra as sets of molecular formulae, which are themselves multisets of atoms.
We show promising empirical results on mass spectra prediction tasks.
arXiv Detail & Related papers (2023-03-11T17:44:28Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Substructure-Atom Cross Attention for Molecular Representation Learning [21.4652884347198]
We propose a new framework for molecular representation learning.
Our contribution is threefold: (a) demonstrating the usefulness of incorporating substructures to node-wise features from molecules, (b) designing two branch networks consisting of a transformer and a graph neural network, and (c) not requiring features and computationally-expensive information from molecules.
arXiv Detail & Related papers (2022-10-15T09:44:27Z) - Accurate Machine Learned Quantum-Mechanical Force Fields for
Biomolecular Simulations [51.68332623405432]
Molecular dynamics (MD) simulations allow atomistic insights into chemical and biological processes.
Recently, machine learned force fields (MLFFs) emerged as an alternative means to execute MD simulations.
This work proposes a general approach to constructing accurate MLFFs for large-scale molecular simulations.
arXiv Detail & Related papers (2022-05-17T13:08:28Z) - Scalable Fragment-Based 3D Molecular Design with Reinforcement Learning [68.8204255655161]
We introduce a novel framework for scalable 3D design that uses a hierarchical agent to build molecules.
In a variety of experiments, we show that our agent, guided only by energy considerations, can efficiently learn to produce molecules with over 100 atoms.
arXiv Detail & Related papers (2022-02-01T18:54:24Z) - Chemical-Reaction-Aware Molecule Representation Learning [88.79052749877334]
We propose using chemical reactions to assist learning molecule representation.
Our approach is proven effective to 1) keep the embedding space well-organized and 2) improve the generalization ability of molecule embeddings.
Experimental results demonstrate that our method achieves state-of-the-art performance in a variety of downstream tasks.
arXiv Detail & Related papers (2021-09-21T00:08:43Z) - Do Large Scale Molecular Language Representations Capture Important
Structural Information? [31.76876206167457]
We present molecular embeddings obtained by training an efficient transformer encoder model, referred to as MoLFormer.
Experiments show that the learned molecular representation performs competitively, when compared to graph-based and fingerprint-based supervised learning baselines.
arXiv Detail & Related papers (2021-06-17T14:33:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.