DiSE: A diffusion probabilistic model for automatic structure elucidation of organic compounds
- URL: http://arxiv.org/abs/2510.26231v1
- Date: Thu, 30 Oct 2025 08:10:03 GMT
- Title: DiSE: A diffusion probabilistic model for automatic structure elucidation of organic compounds
- Authors: Haochen Chen, Qi Huang, Anan Wu, Wenhao Zhang, Jianliang Ye, Jianming Wu, Kai Tan, Xin Lu, Xin Xu,
- Abstract summary: DiSE is an end-to-end diffusion-based generative model that integrates multiple spectroscopic modalities.<n>It achieves superior accuracy, strong generalization across chemically diverse datasets, and robustness to experimental data despite being trained on calculated spectra.
- Score: 17.43184484460819
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic structure elucidation is essential for self-driving laboratories as it enables the system to achieve truly autonomous. This capability closes the experimental feedback loop, ensuring that machine learning models receive reliable structure information for real-time decision-making and optimization. Herein, we present DiSE, an end-to-end diffusion-based generative model that integrates multiple spectroscopic modalities, including MS, 13C and 1H chemical shifts, HSQC, and COSY, to achieve automated yet accurate structure elucidation of organic compounds. By learning inherent correlations among spectra through data-driven approaches, DiSE achieves superior accuracy, strong generalization across chemically diverse datasets, and robustness to experimental data despite being trained on calculated spectra. DiSE thus represents a significant advance toward fully automated structure elucidation, with broad potential in natural product research, drug discovery, and self-driving laboratories.
Related papers
- An Agentic Framework for Autonomous Materials Computation [70.24472585135929]
Large Language Models (LLMs) have emerged as powerful tools for accelerating scientific discovery.<n>Recent advances integrate LLMs into agentic frameworks, enabling retrieval, reasoning, and tool use for complex scientific experiments.<n>Here, we present a domain-specialized agent designed for reliable automation of first-principles materials computations.
arXiv Detail & Related papers (2025-12-22T15:03:57Z) - AutoLabs: Cognitive Multi-Agent Systems with Self-Correction for Autonomous Chemical Experimentation [0.10999592665107412]
AutoLabs is a self-correcting, multi-agent architecture designed to autonomously translate natural-language instructions into executable protocols.<n>We present a comprehensive evaluation framework featuring five benchmark experiments of increasing complexity.<n>Our results demonstrate that agent reasoning capacity is the most critical factor for success.
arXiv Detail & Related papers (2025-09-30T01:51:46Z) - Interpretable Spectral Features Predict Conductivity in Self-Driving Doped Conjugated Polymer Labs [2.8914750842461583]
Self-driving labs promise faster materials discovery by coupling automation with machine learning.<n>We address this by learning interpretable spectral fingerprints from optical spectroscopy to predict electrical conductivity.
arXiv Detail & Related papers (2025-09-06T18:00:40Z) - A Neural Network Approach to Multi-radionuclide TDCR Beta Spectroscopy [12.470638217209851]
Liquid scintillation triple-to-doubly coincident ratio (TDCR) spectroscopy is widely adopted as a standard method for radionuclide quantification.<n>Here, we present an Artificial Intelligence framework that combines numerical spectral simulation and deep learning for standard-free automated analysis.
arXiv Detail & Related papers (2025-09-03T08:40:02Z) - A Multi-Agent System Enables Versatile Information Extraction from the Chemical Literature [8.306442315850878]
We develop a multimodal large language model (MLLM)-based multi-agent system for robust and automated chemical information extraction.<n>Our system achieved an F1 score of 80.8% on a benchmark dataset of sophisticated multimodal chemical reaction graphics from the literature.
arXiv Detail & Related papers (2025-07-27T11:16:57Z) - ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data [53.78763789036172]
We present ChemActor, a fully fine-tuned large language model (LLM) as a chemical executor to convert between unstructured experimental procedures and structured action sequences.<n>This framework integrates a data selection module that selects data based on distribution divergence, with a general-purpose LLM, to generate machine-executable actions from a single molecule input.<n>Experiments on reaction-to-description (R2D) and description-to-action (D2A) tasks demonstrate that ChemActor achieves state-of-the-art performance, outperforming the baseline model by 10%.
arXiv Detail & Related papers (2025-06-30T05:11:19Z) - Causal Representation Learning from Multimodal Biomedical Observations [57.00712157758845]
We develop flexible identification conditions for multimodal data and principled methods to facilitate the understanding of biomedical datasets.<n>Key theoretical contribution is the structural sparsity of causal connections between modalities.<n>Results on a real-world human phenotype dataset are consistent with established biomedical research.
arXiv Detail & Related papers (2024-11-10T16:40:27Z) - A Transformer Based Generative Chemical Language AI Model for Structural Elucidation of Organic Compounds [1.5628118690186594]
We present a proof-of-concept transformer based generative chemical language artificial intelligence (AI) model.
Our model employs an encoder-decoder architecture and self-attention mechanisms to directly generate the most probable chemical structures.
It performs structural elucidation of molecules with up to 29 atoms in just a few seconds on a modern CPU, achieving a top-15 accuracy of 83%.
arXiv Detail & Related papers (2024-10-13T15:41:20Z) - Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis [55.30328162764292]
Chemist-X is a comprehensive AI agent that automates the reaction condition optimization (RCO) task in chemical synthesis.<n>The agent uses retrieval-augmented generation (RAG) technology and AI-controlled wet-lab experiment executions.<n>Results of our automatic wet-lab experiments, achieved by fully LLM-supervised end-to-end operation with no human in the lope, prove Chemist-X's ability in self-driving laboratories.
arXiv Detail & Related papers (2023-11-16T01:21:33Z) - Closing the loop: Autonomous experiments enabled by
machine-learning-based online data analysis in synchrotron beamline
environments [80.49514665620008]
Machine learning can be used to enhance research involving large or rapidly generated datasets.
In this study, we describe the incorporation of ML into a closed-loop workflow for X-ray reflectometry (XRR)
We present solutions that provide an elementary data analysis in real time during the experiment without introducing the additional software dependencies in the beamline control software environment.
arXiv Detail & Related papers (2023-06-20T21:21:19Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.