Generative structural elucidation from mass spectra as an iterative optimization problem
- URL: http://arxiv.org/abs/2602.07709v1
- Date: Sat, 07 Feb 2026 21:34:38 GMT
- Title: Generative structural elucidation from mass spectra as an iterative optimization problem
- Authors: Mrunali Manjrekar, Runzhong Wang, Samuel Goldman, Jenna C. Fromer, Connor W. Coley,
- Abstract summary: We introduce a computational workflow that poses structure elucidation from LC-MS/MS as an iterative optimization problem.<n>We demonstrate chromatography's performance on the NIST'20 and MassSpecGym datasets as both a standalone elucidation pipeline and as a complement to existing inverse models.
- Score: 23.97077717251806
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Liquid chromatography tandem mass spectrometry (LC-MS/MS) is a critical analytical technique for molecular identification across metabolomics, environmental chemistry, and chemical forensics. A variety of computational methods have emerged for structural annotation of spectral features of interest, but many of these features cannot be confidently annotated with reference structures or spectra. Here, we introduce FOAM (Formula-constrained Optimization for Annotating Metabolites), a computational workflow that poses structure elucidation from LC-MS/MS as an iterative optimization problem. FOAM couples a formula-constrained graph genetic algorithm with spectral simulation to explore candidate annotations given an experimental spectrum. We demonstrate FOAM's performance on the NIST'20 and MassSpecGym datasets as both a standalone elucidation pipeline and as a complement to existing inverse models. This work establishes iterative optimization as an effective and extensible paradigm for structural elucidation.
Related papers
- FlexMS is a flexible framework for benchmarking deep learning-based mass spectrum prediction tools in metabolomics [22.314786276794717]
The identification and property prediction of chemical molecules is of central importance in the advancement of drug discovery and material science.<n>Deep learning models appear promising for predicting molecular structure spectra, but overall assessment remains challenging.<n>Our contribution is the creation of benchmark framework FlexMS for constructing and evaluating diverse model architectures in mass spectrum prediction.
arXiv Detail & Related papers (2026-02-26T10:05:01Z) - How well can off-the-shelf LLMs elucidate molecular structures from mass spectra using chain-of-thought reasoning? [51.286853421822705]
Large language models (LLMs) have shown promise for reasoning-intensive scientific tasks, but their capability for chemical interpretation is still unclear.<n>We introduce a Chain-of-Thought (CoT) prompting framework and benchmark that evaluate how LLMs reason about mass spectral data to predict molecular structures.<n>Our evaluation across metrics of SMILES validity, formula consistency, and structural similarity reveals that while LLMs can produce syntactically valid and partially plausible structures, they fail to achieve chemical accuracy or link reasoning to correct molecular predictions.
arXiv Detail & Related papers (2026-01-09T20:08:42Z) - SIGMA: Scalable Spectral Insights for LLM Collapse [51.863164847253366]
We introduce SIGMA (Spectral Inequalities for Gram Matrix Analysis), a unified framework for model collapse.<n>By utilizing benchmarks that deriving and deterministic bounds on the matrix's spectrum, SIGMA provides a mathematically grounded metric to track the contraction of the representation space.<n>We demonstrate that SIGMA effectively captures the transition towards states, offering both theoretical insights into the mechanics of collapse.
arXiv Detail & Related papers (2026-01-06T19:47:11Z) - Language Models Can Understand Spectra: A Multimodal Model for Molecular Structure Elucidation [9.987376780022345]
We propose SpectraLLM, the first large language model designed to support multi-modal spectroscopic joint reasoning.<n>By integrating continuous and discrete spectroscopic modalities into a shared semantic space, SpectraLLM learns to uncover substructural patterns that are consistent and complementary across spectra.<n>We pretrain and fine-tune SpectraLLM in the domain of small molecules, and evaluate it on six standardized, publicly available chemical datasets.
arXiv Detail & Related papers (2025-08-04T13:33:38Z) - DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models [68.19129717255053]
We present DiffSpectra, a generative framework that formulates molecular structure elucidation as a conditional generation process.<n>Our experiments demonstrate that DiffSpectra accurately elucidates molecular structures, achieving 40.76% top-1 and 99.49% top-10 accuracy.
arXiv Detail & Related papers (2025-07-09T13:57:20Z) - Towards a Unified Textual Graph Framework for Spectral Reasoning via Physical and Chemical Information Fusion [44.90118820073463]
We propose a novel multi-modal spectral analysis framework that integrates prior knowledge graphs with Large Language Models.<n>Our method bridges physical spectral measurements and chemical structural semantics by representing them in a unified Textual Graph format.<n>Our framework achieves consistently high performance across multiple spectral analysis tasks, including node-level, edge-level, and graph-level classification.
arXiv Detail & Related papers (2025-06-21T16:58:30Z) - Unraveling Molecular Structure: A Multimodal Spectroscopic Dataset for Chemistry [0.1747623282473278]
This dataset comprises simulated $1$H-NMR, $13$C-NMR, HSQC-NMR, Infrared, and Mass spectra for 790k molecules extracted from chemical reactions in patent data.
We provide benchmarks for evaluating single-modality tasks such as structure elucidation, predicting the spectra for a target molecule, and functional group predictions.
arXiv Detail & Related papers (2024-07-04T12:52:48Z) - Spectral Phase Transition and Optimal PCA in Block-Structured Spiked
models [20.742571160909456]
We discuss the inhomogeneous spiked Wigner model, a theoretical framework recently introduced to study structured noise in various learning scenarios.
Our primary objective is to find an optimal spectral method and to extend the celebrated citeBBP (BBP) phase transition criterion to our inhomogeneous, block-structured, Wigner model.
arXiv Detail & Related papers (2024-03-06T13:23:55Z) - Datacube segmentation via Deep Spectral Clustering [76.48544221010424]
Extended Vision techniques often pose a challenge in their interpretation.
The huge dimensionality of data cube spectra poses a complex task in its statistical interpretation.
In this paper, we explore the possibility of applying unsupervised clustering methods in encoded space.
A statistical dimensional reduction is performed by an ad hoc trained (Variational) AutoEncoder, while the clustering process is performed by a (learnable) iterative K-Means clustering algorithm.
arXiv Detail & Related papers (2024-01-31T09:31:28Z) - Spectral Decomposition Representation for Reinforcement Learning [100.0424588013549]
We propose an alternative spectral method, Spectral Decomposition Representation (SPEDER), that extracts a state-action abstraction from the dynamics without inducing spurious dependence on the data collection policy.
A theoretical analysis establishes the sample efficiency of the proposed algorithm in both the online and offline settings.
An experimental investigation demonstrates superior performance over current state-of-the-art algorithms across several benchmarks.
arXiv Detail & Related papers (2022-08-19T19:01:30Z) - Gaussian Process Regression for Absorption Spectra Analysis of Molecular
Dimers [68.8204255655161]
We discuss an approach based on a machine learning technique, where the parameters for the numerical calculations are chosen from Gaussian Process Regression (GPR)
This approach does not only quickly converge to an optimal parameter set, but in addition provides information about the complete parameter space.
We find that indeed the GPR gives reliable results which are in agreement with direct calculations of these parameters using quantum chemical methods.
arXiv Detail & Related papers (2021-12-14T17:46:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.