Structure to Property: Chemical Element Embeddings and a Deep Learning
Approach for Accurate Prediction of Chemical Properties
- URL: http://arxiv.org/abs/2309.09355v1
- Date: Sun, 17 Sep 2023 19:41:32 GMT
- Title: Structure to Property: Chemical Element Embeddings and a Deep Learning
Approach for Accurate Prediction of Chemical Properties
- Authors: Shokirbek Shermukhamedov, Dilorom Mamurjonova, Michael Probst
- Abstract summary: This paper introduces a new machine learning model based on deep learning techniques, such as a multilayer encoder and decoder architecture, for classification tasks.
We demonstrate the opportunities offered by our approach by applying it to various types of input data, including organic and inorganic compounds.
The models used in this work exhibit a high degree of predictive power, underscoring the progress that can be made with refined machine learning.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The application of machine learning (ML) techniques in computational
chemistry has led to significant advances in predicting molecular properties,
accelerating drug discovery, and material design. ML models can extract hidden
patterns and relationships from complex and large datasets, allowing for the
prediction of various chemical properties with high accuracy. The use of such
methods has enabled the discovery of molecules and materials that were
previously difficult to identify. This paper introduces a new ML model based on
deep learning techniques, such as a multilayer encoder and decoder
architecture, for classification tasks. We demonstrate the opportunities
offered by our approach by applying it to various types of input data,
including organic and inorganic compounds. In particular, we developed and
tested the model using the Matbench and Moleculenet benchmarks, which include
crystal properties and drug design-related benchmarks. We also conduct a
comprehensive analysis of vector representations of chemical compounds,
shedding light on the underlying patterns in molecular data. The models used in
this work exhibit a high degree of predictive power, underscoring the progress
that can be made with refined machine learning when applied to molecular and
material datasets. For instance, on the Tox21 dataset, we achieved an average
accuracy of 96%, surpassing the previous best result by 10%. Our code is
publicly available at https://github.com/dmamur/elembert.
Related papers
- MoleculeCLA: Rethinking Molecular Benchmark via Computational Ligand-Target Binding Analysis [18.940529282539842]
We construct a large-scale and precise molecular representation dataset of approximately 140,000 small molecules.
Our dataset offers significant physicochemical interpretability to guide model development and design.
We believe this dataset will serve as a more accurate and reliable benchmark for molecular representation learning.
arXiv Detail & Related papers (2024-06-13T02:50:23Z) - Unsupervised Learning of Molecular Embeddings for Enhanced Clustering
and Emergent Properties for Chemical Compounds [2.6803933204362336]
We introduce various methods to detect and cluster chemical compounds based on their SMILES data.
Our first method, analyzing the graphical structures of chemical compounds using embedding data, employs vector search to meet our threshold value.
We also used natural language description embeddings stored in a vector database with GPT3.5, which outperforms the base model.
arXiv Detail & Related papers (2023-10-25T18:00:24Z) - MolGrapher: Graph-based Visual Recognition of Chemical Structures [50.13749978547401]
We introduce MolGrapher to recognize chemical structures visually.
We treat all candidate atoms and bonds as nodes and put them in a graph.
We classify atom and bond nodes in the graph with a Graph Neural Network.
arXiv Detail & Related papers (2023-08-23T16:16:11Z) - QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules [69.25826391912368]
We generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 999 or 2998 molecular dynamics trajectories.
We show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules.
arXiv Detail & Related papers (2023-06-15T23:39:07Z) - Atomic and Subgraph-aware Bilateral Aggregation for Molecular
Representation Learning [57.670845619155195]
We introduce a new model for molecular representation learning called the Atomic and Subgraph-aware Bilateral Aggregation (ASBA)
ASBA addresses the limitations of previous atom-wise and subgraph-wise models by incorporating both types of information.
Our method offers a more comprehensive way to learn representations for molecular property prediction and has broad potential in drug and material discovery applications.
arXiv Detail & Related papers (2023-05-22T00:56:00Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Graph-based Molecular Representation Learning [59.06193431883431]
Molecular representation learning (MRL) is a key step to build the connection between machine learning and chemical science.
Recently, MRL has achieved considerable progress, especially in methods based on deep molecular graph learning.
arXiv Detail & Related papers (2022-07-08T17:43:20Z) - Improving VAE based molecular representations for compound property
prediction [0.0]
We propose a simple method to improve chemical property prediction performance of machine learning models.
We show the relation between the performance of property prediction models and the distance between property prediction dataset and the larger unlabeled dataset.
arXiv Detail & Related papers (2022-01-13T12:57:11Z) - Do Large Scale Molecular Language Representations Capture Important
Structural Information? [31.76876206167457]
We present molecular embeddings obtained by training an efficient transformer encoder model, referred to as MoLFormer.
Experiments show that the learned molecular representation performs competitively, when compared to graph-based and fingerprint-based supervised learning baselines.
arXiv Detail & Related papers (2021-06-17T14:33:55Z) - Advanced Graph and Sequence Neural Networks for Molecular Property
Prediction and Drug Discovery [53.00288162642151]
We develop MoleculeKit, a suite of comprehensive machine learning tools spanning different computational models and molecular representations.
Built on these representations, MoleculeKit includes both deep learning and traditional machine learning methods for graph and sequence data.
Results on both online and offline antibiotics discovery and molecular property prediction tasks show that MoleculeKit achieves consistent improvements over prior methods.
arXiv Detail & Related papers (2020-12-02T02:09:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.