NMRTrans: Structure Elucidation from Experimental NMR Spectra via Set Transformers
- URL: http://arxiv.org/abs/2602.10158v1
- Date: Tue, 10 Feb 2026 03:37:41 GMT
- Title: NMRTrans: Structure Elucidation from Experimental NMR Spectra via Set Transformers
- Authors: Liujia Yang, Zhuo Yang, Jiaqing Xie, Yubin Wang, Ben Gao, Tianfan Fu, Xingjian Wei, Jiaxing Sun, Jiang Wu, Conghui He, Yuqiang Li, Qinying Gu,
- Abstract summary: We build NMRSpec, a large-scale corpus of experimental $1$H and $13$C spectra mined from chemical literature.<n>We propose NMRTrans, which models spectra as unordered peak sets and aligns the model's inductive bias with the physical nature of NMR.
- Score: 41.6373573055135
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Nuclear Magnetic Resonance (NMR) spectroscopy is fundamental for molecular structure elucidation, yet interpreting spectra at scale remains time-consuming and highly expertise-dependent. While recent spectrum-as-language modeling and retrieval-based methods have shown promise, they rely heavily on large corpora of computed spectra and exhibit notable performance drops when applied to experimental measurements. To address these issues, we build NMRSpec, a large-scale corpus of experimental $^1$H and $^{13}$C spectra mined from chemical literature, and propose NMRTrans, which models spectra as unordered peak sets and aligns the model's inductive bias with the physical nature of NMR. To our best knowledge, NMRTrans is the first NMR Transformer trained solely on large-scale experimental spectra and achieves state-of-the-art performance on experimental benchmarks, improving Top-10 Accuracy over the strongest baseline by +17.82 points (61.15% vs. 43.33%), and underscoring the importance of experimental data and structure-aware architectures for reliable NMR structure elucidation.
Related papers
- NMIRacle: Multi-modal Generative Molecular Elucidation from IR and NMR Spectra [13.594833907772783]
We introduce NMIRacle, a two-stage generative framework that builds upon recent paradigms in AI-driven spectroscopy with minimal assumptions.<n>In the first stage, NMIRacle learns to reconstruct molecular structures from count-aware fragment encodings.<n>In the second stage, a spectral encoder maps input spectroscopic measurements into a latent embedding.<n>This formulation bridges fragment-level chemical modeling with spectral evidence, yielding accurate molecular predictions.
arXiv Detail & Related papers (2025-12-17T10:29:39Z) - Atomic Diffusion Models for Small Molecule Structure Elucidation from NMR Spectra [5.818797900550866]
ChefNMR (CHemical Elucidation From NMR) is an end-to-end framework that directly predicts an unknown molecule's structure.<n>To model the complex chemical groups found in natural products, we generated a dataset of simulated 1D NMR spectra for over 111,000 natural products.<n>ChefNMR predicts the structures of challenging natural product compounds with an unsurpassed accuracy of over 65%.
arXiv Detail & Related papers (2025-12-02T18:59:13Z) - Breaking the Modality Barrier: Generative Modeling for Accurate Molecule Retrieval from Mass Spectra [60.08608779794957]
We propose GLMR, a Generative Language Model-based Retrieval framework.<n>In the pre-retrieval stage, a contrastive learning-based model identifies top candidate molecules as contextual priors for the input mass spectrum.<n>In the generative retrieval stage, these candidate molecules are integrated with the input mass spectrum to guide a generative model in producing refined molecular structures.
arXiv Detail & Related papers (2025-11-09T07:25:53Z) - NMR-Solver: Automated Structure Elucidation via Large-Scale Spectral Matching and Physics-Guided Fragment Optimization [24.714189961887215]
Nuclear Magnetic Resonance (NMR) spectroscopy is one of the most powerful and widely used tools for molecular structure elucidation in organic chemistry.<n>Here, we present NMR-r, a practical and interpretable framework for the automated determination of small organic molecule structures from $1$H and $13$C NMR spectra.
arXiv Detail & Related papers (2025-08-30T23:59:12Z) - DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models [68.19129717255053]
We present DiffSpectra, a generative framework that formulates molecular structure elucidation as a conditional generation process.<n>Our experiments demonstrate that DiffSpectra accurately elucidates molecular structures, achieving 40.76% top-1 and 99.49% top-10 accuracy.
arXiv Detail & Related papers (2025-07-09T13:57:20Z) - DiffNMR: Diffusion Models for Nuclear Magnetic Resonance Spectra Elucidation [9.321270922757442]
Nuclear Magnetic Resonance (NMR) spectroscopy is a central characterization method for molecular structure elucidation.<n>We introduce DiffNMR, a novel end-to-end framework that leverages a conditional discrete diffusion model for de novo molecular structure elucidation from NMR spectra.
arXiv Detail & Related papers (2025-07-09T06:21:36Z) - TransPeakNet: Solvent-Aware 2D NMR Prediction via Multi-Task Pre-Training and Unsupervised Learning [5.7279868722119325]
We introduce an unsupervised training framework for predicting cross-peaks in 2D NMR.<n>Our approach pretrains an ML model on an annotated 1D dataset of 1H and 13C shifts, then finetunes it in an unsupervised manner.<n> Evaluation on 479 expert-annotated HSQC spectra demonstrates our model's superiority over traditional methods.
arXiv Detail & Related papers (2024-03-17T21:52:51Z) - SpectralNeRF: Physically Based Spectral Rendering with Neural Radiance
Field [70.15900280156262]
We propose an end-to-end Neural Radiance Field (NeRF)-based architecture for high-quality physically based rendering from a novel spectral perspective.
SpectralNeRF is superior to recent NeRF-based methods when synthesizing new views on synthetic and real datasets.
arXiv Detail & Related papers (2023-12-14T07:19:31Z) - Gaussian Process Regression for Absorption Spectra Analysis of Molecular
Dimers [68.8204255655161]
We discuss an approach based on a machine learning technique, where the parameters for the numerical calculations are chosen from Gaussian Process Regression (GPR)
This approach does not only quickly converge to an optimal parameter set, but in addition provides information about the complete parameter space.
We find that indeed the GPR gives reliable results which are in agreement with direct calculations of these parameters using quantum chemical methods.
arXiv Detail & Related papers (2021-12-14T17:46:45Z) - Two-Dimensional Single- and Multiple-Quantum Correlation Spectroscopy in
Zero-Field Nuclear Magnetic Resonance [55.41644538483948]
We present single- and multiple-quantum correlation $J$-spectroscopy detected in zero magnetic field using a Rb vapor-cell magnetometer.
At zero field the spectrum of ethanol appears as a mixture of carbon isotopomers, and correlation spectroscopy is useful in separating the two composite spectra.
arXiv Detail & Related papers (2020-04-09T10:02:45Z) - Blind Source Separation for NMR Spectra with Negative Intensity [0.0]
We benchmark several blind source separation techniques for analysis of NMR spectral datasets containing negative intensity.
FastICA, SIMPLISMA, and NNMF are top-performing techniques.
The accuracy of FastICA and SIMPLISMA degrades quickly if excess (unreal) pure components are predicted.
arXiv Detail & Related papers (2020-02-07T20:57:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.