Atomic Diffusion Models for Small Molecule Structure Elucidation from NMR Spectra
- URL: http://arxiv.org/abs/2512.03127v1
- Date: Tue, 02 Dec 2025 18:59:13 GMT
- Title: Atomic Diffusion Models for Small Molecule Structure Elucidation from NMR Spectra
- Authors: Ziyu Xiong, Yichi Zhang, Foyez Alauddin, Chu Xin Cheng, Joon Soo An, Mohammad R. Seyedsayamdost, Ellen D. Zhong,
- Abstract summary: ChefNMR (CHemical Elucidation From NMR) is an end-to-end framework that directly predicts an unknown molecule's structure.<n>To model the complex chemical groups found in natural products, we generated a dataset of simulated 1D NMR spectra for over 111,000 natural products.<n>ChefNMR predicts the structures of challenging natural product compounds with an unsurpassed accuracy of over 65%.
- Score: 5.818797900550866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nuclear Magnetic Resonance (NMR) spectroscopy is a cornerstone technique for determining the structures of small molecules and is especially critical in the discovery of novel natural products and clinical therapeutics. Yet, interpreting NMR spectra remains a time-consuming, manual process requiring extensive domain expertise. We introduce ChefNMR (CHemical Elucidation From NMR), an end-to-end framework that directly predicts an unknown molecule's structure solely from its 1D NMR spectra and chemical formula. We frame structure elucidation as conditional generation from an atomic diffusion model built on a non-equivariant transformer architecture. To model the complex chemical groups found in natural products, we generated a dataset of simulated 1D NMR spectra for over 111,000 natural products. ChefNMR predicts the structures of challenging natural product compounds with an unsurpassed accuracy of over 65%. This work takes a significant step toward solving the grand challenge of automating small-molecule structure elucidation and highlights the potential of deep learning in accelerating molecular discovery. Code is available at https://github.com/ml-struct-bio/chefnmr.
Related papers
- NMRTrans: Structure Elucidation from Experimental NMR Spectra via Set Transformers [41.6373573055135]
We build NMRSpec, a large-scale corpus of experimental $1$H and $13$C spectra mined from chemical literature.<n>We propose NMRTrans, which models spectra as unordered peak sets and aligns the model's inductive bias with the physical nature of NMR.
arXiv Detail & Related papers (2026-02-10T03:37:41Z) - How well can off-the-shelf LLMs elucidate molecular structures from mass spectra using chain-of-thought reasoning? [51.286853421822705]
Large language models (LLMs) have shown promise for reasoning-intensive scientific tasks, but their capability for chemical interpretation is still unclear.<n>We introduce a Chain-of-Thought (CoT) prompting framework and benchmark that evaluate how LLMs reason about mass spectral data to predict molecular structures.<n>Our evaluation across metrics of SMILES validity, formula consistency, and structural similarity reveals that while LLMs can produce syntactically valid and partially plausible structures, they fail to achieve chemical accuracy or link reasoning to correct molecular predictions.
arXiv Detail & Related papers (2026-01-09T20:08:42Z) - NMIRacle: Multi-modal Generative Molecular Elucidation from IR and NMR Spectra [13.594833907772783]
We introduce NMIRacle, a two-stage generative framework that builds upon recent paradigms in AI-driven spectroscopy with minimal assumptions.<n>In the first stage, NMIRacle learns to reconstruct molecular structures from count-aware fragment encodings.<n>In the second stage, a spectral encoder maps input spectroscopic measurements into a latent embedding.<n>This formulation bridges fragment-level chemical modeling with spectral evidence, yielding accurate molecular predictions.
arXiv Detail & Related papers (2025-12-17T10:29:39Z) - Mamba-driven multi-perspective structural understanding for molecular ground-state conformation prediction [69.32436472760712]
We propose an approach of Mamba-driven multi-perspective structural understanding (MPSU-Mamba) to localize molecular ground-state conformation.<n>For complex and diverse molecules, three different kinds of dedicated scanning strategies are explored to construct a comprehensive perception of corresponding molecular structures.<n> Experimental results on QM9 and Molecule3D datasets indicate that MPSU-Mamba significantly outperforms existing methods.
arXiv Detail & Related papers (2025-11-10T11:18:32Z) - Breaking the Modality Barrier: Generative Modeling for Accurate Molecule Retrieval from Mass Spectra [60.08608779794957]
We propose GLMR, a Generative Language Model-based Retrieval framework.<n>In the pre-retrieval stage, a contrastive learning-based model identifies top candidate molecules as contextual priors for the input mass spectrum.<n>In the generative retrieval stage, these candidate molecules are integrated with the input mass spectrum to guide a generative model in producing refined molecular structures.
arXiv Detail & Related papers (2025-11-09T07:25:53Z) - NMR-Solver: Automated Structure Elucidation via Large-Scale Spectral Matching and Physics-Guided Fragment Optimization [24.714189961887215]
Nuclear Magnetic Resonance (NMR) spectroscopy is one of the most powerful and widely used tools for molecular structure elucidation in organic chemistry.<n>Here, we present NMR-r, a practical and interpretable framework for the automated determination of small organic molecule structures from $1$H and $13$C NMR spectra.
arXiv Detail & Related papers (2025-08-30T23:59:12Z) - DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models [68.19129717255053]
We present DiffSpectra, a generative framework that formulates molecular structure elucidation as a conditional generation process.<n>Our experiments demonstrate that DiffSpectra accurately elucidates molecular structures, achieving 40.76% top-1 and 99.49% top-10 accuracy.
arXiv Detail & Related papers (2025-07-09T13:57:20Z) - DiffNMR: Diffusion Models for Nuclear Magnetic Resonance Spectra Elucidation [9.321270922757442]
Nuclear Magnetic Resonance (NMR) spectroscopy is a central characterization method for molecular structure elucidation.<n>We introduce DiffNMR, a novel end-to-end framework that leverages a conditional discrete diffusion model for de novo molecular structure elucidation from NMR spectra.
arXiv Detail & Related papers (2025-07-09T06:21:36Z) - DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra [60.39311767532607]
We present DiffMS, a formula-restricted encoder-decoder generative network that achieves state-of-the-art performance on this task.<n>To develop a robust decoder that bridges latent embeddings and molecular structures, we pretrain the diffusion decoder with fingerprint-structure pairs.<n>Experiments on established benchmarks show that DiffMS outperforms existing models on de novo molecule generation.
arXiv Detail & Related papers (2025-02-13T18:29:48Z) - Accurate and efficient structure elucidation from routine one-dimensional NMR spectra using multitask machine learning [1.2754578699685275]
We introduce a machine learning framework that predicts the molecular structure of an unknown compound based on its 1D 1H and/or 13C NMR spectra.
Integrating this capability with a convolutional neural network (CNN), we build an end-to-end model for predicting structure from spectra that is fast and accurate.
arXiv Detail & Related papers (2024-08-15T17:37:36Z) - Unraveling Molecular Structure: A Multimodal Spectroscopic Dataset for Chemistry [0.1747623282473278]
This dataset comprises simulated $1$H-NMR, $13$C-NMR, HSQC-NMR, Infrared, and Mass spectra for 790k molecules extracted from chemical reactions in patent data.
We provide benchmarks for evaluating single-modality tasks such as structure elucidation, predicting the spectra for a target molecule, and functional group predictions.
arXiv Detail & Related papers (2024-07-04T12:52:48Z) - TransPeakNet: Solvent-Aware 2D NMR Prediction via Multi-Task Pre-Training and Unsupervised Learning [5.7279868722119325]
We introduce an unsupervised training framework for predicting cross-peaks in 2D NMR.<n>Our approach pretrains an ML model on an annotated 1D dataset of 1H and 13C shifts, then finetunes it in an unsupervised manner.<n> Evaluation on 479 expert-annotated HSQC spectra demonstrates our model's superiority over traditional methods.
arXiv Detail & Related papers (2024-03-17T21:52:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.