A Transformer Based Generative Chemical Language AI Model for Structural Elucidation of Organic Compounds
- URL: http://arxiv.org/abs/2410.14719v2
- Date: Fri, 25 Oct 2024 02:01:00 GMT
- Title: A Transformer Based Generative Chemical Language AI Model for Structural Elucidation of Organic Compounds
- Authors: Xiaofeng Tan,
- Abstract summary: We present a proof-of-concept transformer based generative chemical language artificial intelligence (AI) model.
Our model employs an encoder-decoder architecture and self-attention mechanisms to directly generate the most probable chemical structures.
It performs structural elucidation of molecules with up to 29 atoms in just a few seconds on a modern CPU, achieving a top-15 accuracy of 83%.
- Score: 1.5628118690186594
- License:
- Abstract: For over half a century, computer-aided structural elucidation systems (CASE) for organic compounds have relied on complex expert systems with explicitly programmed algorithms. These systems are often computationally inefficient for complex compounds due to the vast chemical structural space that must be explored and filtered. In this study, we present a proof-of-concept transformer based generative chemical language artificial intelligence (AI) model, an innovative end-to-end architecture designed to replace the logic and workflow of the classic CASE framework for ultra-fast and accurate spectroscopic-based structural elucidation. Our model employs an encoder-decoder architecture and self-attention mechanisms, similar to those in large language models, to directly generate the most probable chemical structures that match the input spectroscopic data. Trained on ~ 102k IR, UV, and 1H NMR spectra, it performs structural elucidation of molecules with up to 29 atoms in just a few seconds on a modern CPU, achieving a top-15 accuracy of 83%. This approach demonstrates the potential of transformer based generative AI to accelerate traditional scientific problem-solving processes. The model's ability to iterate quickly based on new data highlights its potential for rapid advancements in structural elucidation.
Related papers
- GraphXForm: Graph transformer for computer-aided molecular design with application to extraction [73.1842164721868]
We present GraphXForm, a decoder-only graph transformer architecture, which is pretrained on existing compounds and then fine-tuned.
We evaluate it on two solvent design tasks for liquid-liquid extraction, showing that it outperforms four state-of-the-art molecular design techniques.
arXiv Detail & Related papers (2024-11-03T19:45:15Z) - Generative Hierarchical Materials Search [91.93125016916463]
We propose Generative Hierarchical Materials Search (GenMS) for controllable generation of crystal structures.
GenMS consists of (1) a language model that takes high-level natural language as input and generates intermediate textual information about a crystal.
GenMS additionally uses a graph neural network to predict properties (e.g., formation energy) from the generated crystal structures.
arXiv Detail & Related papers (2024-09-10T17:51:28Z) - BatGPT-Chem: A Foundation Large Model For Retrosynthesis Prediction [65.93303145891628]
BatGPT-Chem is a large language model with 15 billion parameters, tailored for enhanced retrosynthesis prediction.
Our model captures a broad spectrum of chemical knowledge, enabling precise prediction of reaction conditions.
This development empowers chemists to adeptly address novel compounds, potentially expediting the innovation cycle in drug manufacturing and materials science.
arXiv Detail & Related papers (2024-08-19T05:17:40Z) - Accurate and efficient structure elucidation from routine one-dimensional NMR spectra using multitask machine learning [1.2754578699685275]
We introduce a machine learning framework that predicts the molecular structure of an unknown compound based on its 1D 1H and/or 13C NMR spectra.
Integrating this capability with a convolutional neural network (CNN), we build an end-to-end model for predicting structure from spectra that is fast and accurate.
arXiv Detail & Related papers (2024-08-15T17:37:36Z) - An Autonomous Large Language Model Agent for Chemical Literature Data
Mining [60.85177362167166]
We introduce an end-to-end AI agent framework capable of high-fidelity extraction from extensive chemical literature.
Our framework's efficacy is evaluated using accuracy, recall, and F1 score of reaction condition data.
arXiv Detail & Related papers (2024-02-20T13:21:46Z) - Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis [57.70772230913099]
Chemist-X automates the reaction condition recommendation (RCR) task in chemical synthesis with retrieval-augmented generation (RAG) technology.
Chemist-X interrogates online molecular databases and distills critical data from the latest literature database.
Chemist-X considerably reduces chemists' workload and allows them to focus on more fundamental and creative problems.
arXiv Detail & Related papers (2023-11-16T01:21:33Z) - Atomic structure generation from reconstructing structural fingerprints [1.2128971613239876]
We present an end-to-end structure generation approach using atom-centered symmetry functions as the representation and conditional variational autoencoders as the generative model.
We are able to successfully generate novel and valid atomic structures of sub-nanometer Pt nanoparticles as a proof of concept.
arXiv Detail & Related papers (2022-07-27T00:42:59Z) - Transferring Chemical and Energetic Knowledge Between Molecular Systems
with Machine Learning [5.27145343046974]
We propose a novel methodology for transferring knowledge obtained from simple molecular systems to a more complex one.
We focus on the classification of high and low free-energy states.
Our results show a remarkable AUC of 0.92 for transfer learning from tri-alanine to the deca-alanine system.
arXiv Detail & Related papers (2022-05-06T16:21:00Z) - Geometric Transformer for End-to-End Molecule Properties Prediction [92.28929858529679]
We introduce a Transformer-based architecture for molecule property prediction, which is able to capture the geometry of the molecule.
We modify the classical positional encoder by an initial encoding of the molecule geometry, as well as a learned gated self-attention mechanism.
arXiv Detail & Related papers (2021-10-26T14:14:40Z) - Artificial Intelligence based Autonomous Molecular Design for Medical
Therapeutic: A Perspective [9.371378627575883]
Domain-aware machine learning (ML) models have been increasingly adopted for accelerating small molecule therapeutic design.
We present the most recent breakthrough achieved by each of the components, and how such autonomous AI and ML workflow can be realized.
arXiv Detail & Related papers (2021-02-10T00:43:46Z) - Hierarchical, rotation-equivariant neural networks to select structural
models of protein complexes [6.092214762701847]
We introduce a machine learning method that learns directly from the 3D positions of all atoms to identify accurate models of protein complexes.
Our network substantially improves the identification of accurate structural models among a large set of possible models.
arXiv Detail & Related papers (2020-06-05T20:17:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.