Related papers: Structure to Property: Chemical Element Embeddings and a Deep Learning Approach for Accurate Prediction of Chemical Properties

Structure to Property: Chemical Element Embeddings and a Deep Learning Approach for Accurate Prediction of Chemical Properties

URL: http://arxiv.org/abs/2309.09355v1
Date: Sun, 17 Sep 2023 19:41:32 GMT
Title: Structure to Property: Chemical Element Embeddings and a Deep Learning Approach for Accurate Prediction of Chemical Properties
Authors: Shokirbek Shermukhamedov, Dilorom Mamurjonova, Michael Probst
Abstract summary: This paper introduces a new machine learning model based on deep learning techniques, such as a multilayer encoder and decoder architecture, for classification tasks. We demonstrate the opportunities offered by our approach by applying it to various types of input data, including organic and inorganic compounds. The models used in this work exhibit a high degree of predictive power, underscoring the progress that can be made with refined machine learning.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The application of machine learning (ML) techniques in computational chemistry has led to significant advances in predicting molecular properties, accelerating drug discovery, and material design. ML models can extract hidden patterns and relationships from complex and large datasets, allowing for the prediction of various chemical properties with high accuracy. The use of such methods has enabled the discovery of molecules and materials that were previously difficult to identify. This paper introduces a new ML model based on deep learning techniques, such as a multilayer encoder and decoder architecture, for classification tasks. We demonstrate the opportunities offered by our approach by applying it to various types of input data, including organic and inorganic compounds. In particular, we developed and tested the model using the Matbench and Moleculenet benchmarks, which include crystal properties and drug design-related benchmarks. We also conduct a comprehensive analysis of vector representations of chemical compounds, shedding light on the underlying patterns in molecular data. The models used in this work exhibit a high degree of predictive power, underscoring the progress that can be made with refined machine learning when applied to molecular and material datasets. For instance, on the Tox21 dataset, we achieved an average accuracy of 96%, surpassing the previous best result by 10%. Our code is publicly available at https://github.com/dmamur/elembert.

Related papers

Material Property Prediction with Element Attribute Knowledge Graphs and Multimodal Representation Learning [8.523289773617503]
We build an element property knowledge graph and utilize an embedding model to encode the element attributes within the knowledge graph. A multimodal fusion framework, ESNet, integrates element property features with crystal structure features to generate joint multimodal representations. This provides a more comprehensive perspective for predicting the performance of crystalline materials.
arXiv Detail & Related papers (2024-11-13T08:07:21Z)
Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms. This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z)
A Large Encoder-Decoder Family of Foundation Models For Chemical Language [1.1073864511426255]
This paper introduces a large encoder-decoder chemical foundation models pre-trained on a curated dataset of 91 million SMILES samples sourced from PubChem. Our experiments across multiple benchmark datasets validate the capacity of the proposed model in providing state-of-the-art results for different tasks.
arXiv Detail & Related papers (2024-07-24T20:30:39Z)
ScholarChemQA: Unveiling the Power of Language Models in Chemical Research Question Answering [54.80411755871931]
Question Answering (QA) effectively evaluates language models' reasoning and knowledge depth. Chemical QA plays a crucial role in both education and research by effectively translating complex chemical information into readily understandable format. This dataset reflects typical real-world challenges, including an imbalanced data distribution and a substantial amount of unlabeled data that can be potentially useful. We introduce a QAMatch model, specifically designed to effectively answer chemical questions by fully leveraging our collected data.
arXiv Detail & Related papers (2024-07-24T01:46:55Z)
MoleculeCLA: Rethinking Molecular Benchmark via Computational Ligand-Target Binding Analysis [18.940529282539842]
We construct a large-scale and precise molecular representation dataset of approximately 140,000 small molecules. Our dataset offers significant physicochemical interpretability to guide model development and design. We believe this dataset will serve as a more accurate and reliable benchmark for molecular representation learning.
arXiv Detail & Related papers (2024-06-13T02:50:23Z)
Unsupervised Learning of Molecular Embeddings for Enhanced Clustering and Emergent Properties for Chemical Compounds [2.6803933204362336]
We introduce various methods to detect and cluster chemical compounds based on their SMILES data. Our first method, analyzing the graphical structures of chemical compounds using embedding data, employs vector search to meet our threshold value. We also used natural language description embeddings stored in a vector database with GPT3.5, which outperforms the base model.
arXiv Detail & Related papers (2023-10-25T18:00:24Z)
Towards out-of-distribution generalizable predictions of chemical kinetics properties [61.15970601264632]
Out-Of-Distribution (OOD) kinetic property prediction is required to be generalizable. In this paper, we categorize the OOD kinetic property prediction into three levels (structure, condition, and mechanism) We create comprehensive datasets to benchmark the state-of-the-art ML approaches for reaction prediction in the OOD setting and the state-of-the-art graph OOD methods in kinetics property prediction problems.
arXiv Detail & Related papers (2023-10-04T20:36:41Z)
MolGrapher: Graph-based Visual Recognition of Chemical Structures [50.13749978547401]
We introduce MolGrapher to recognize chemical structures visually. We treat all candidate atoms and bonds as nodes and put them in a graph. We classify atom and bond nodes in the graph with a Graph Neural Network.
arXiv Detail & Related papers (2023-08-23T16:16:11Z)
QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules [69.25826391912368]
We generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 999 or 2998 molecular dynamics trajectories. We show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules.
arXiv Detail & Related papers (2023-06-15T23:39:07Z)
Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations [68.32093648671496]
We introduce GODE, which accounts for the dual-level structure inherent in molecules. Molecules possess an intrinsic graph structure and simultaneously function as nodes within a broader molecular knowledge graph. By pre-training two GNNs on different graph structures, GODE effectively fuses molecular structures with their corresponding knowledge graph substructures.
arXiv Detail & Related papers (2023-06-02T15:49:45Z)
Atomic and Subgraph-aware Bilateral Aggregation for Molecular Representation Learning [57.670845619155195]
We introduce a new model for molecular representation learning called the Atomic and Subgraph-aware Bilateral Aggregation (ASBA) ASBA addresses the limitations of previous atom-wise and subgraph-wise models by incorporating both types of information. Our method offers a more comprehensive way to learn representations for molecular property prediction and has broad potential in drug and material discovery applications.
arXiv Detail & Related papers (2023-05-22T00:56:00Z)
Machine learning with persistent homology and chemical word embeddings improves prediction accuracy and interpretability in metal-organic frameworks [0.07874708385247352]
We introduce an end-to-end machine learning model that automatically generates descriptors that capture a complex representation of a material's structure and chemistry. It automatically encapsulates geometric and chemical information directly from the material system. Our results show considerable improvement in both accuracy and transferability across targets compared to models constructed from the commonly-used, manually-curated features.
arXiv Detail & Related papers (2020-10-01T16:31:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.