Improving VAE based molecular representations for compound property
prediction
- URL: http://arxiv.org/abs/2201.04929v1
- Date: Thu, 13 Jan 2022 12:57:11 GMT
- Title: Improving VAE based molecular representations for compound property
prediction
- Authors: A. Tevosyan (1 and 2), L. Khondkaryan (1), H. Khachatrian (2 and 3),
G. Tadevosyan (1), L. Apresyan (1), N. Babayan (1 and 3), H. Stopper (4), Z.
Navoyan (5) ((1) Institute of Molecular Biology NAS RA Armenia, (2) YerevaNN
Armenia, (3) Yerevan State University Armenia, (4) Institute of Pharmacology
and Toxicology University of W\"urzburg Germany, (5) Toxometris.ai)
- Abstract summary: We propose a simple method to improve chemical property prediction performance of machine learning models.
We show the relation between the performance of property prediction models and the distance between property prediction dataset and the larger unlabeled dataset.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Collecting labeled data for many important tasks in chemoinformatics is time
consuming and requires expensive experiments. In recent years, machine learning
has been used to learn rich representations of molecules using large scale
unlabeled molecular datasets and transfer the knowledge to solve the more
challenging tasks with limited datasets. Variational autoencoders are one of
the tools that have been proposed to perform the transfer for both chemical
property prediction and molecular generation tasks. In this work we propose a
simple method to improve chemical property prediction performance of machine
learning models by incorporating additional information on correlated molecular
descriptors in the representations learned by variational autoencoders. We
verify the method on three property prediction asks. We explore the impact of
the number of incorporated descriptors, correlation between the descriptors and
the target properties, sizes of the datasets etc. Finally, we show the relation
between the performance of property prediction models and the distance between
property prediction dataset and the larger unlabeled dataset in the
representation space.
Related papers
- Physical Consistency Bridges Heterogeneous Data in Molecular Multi-Task Learning [79.75718786477638]
We exploit the specialty of molecular tasks that there are physical laws connecting them, and design consistency training approaches.
We demonstrate that the more accurate energy data can improve the accuracy of structure prediction.
We also find that consistency training can directly leverage force and off-equilibrium structure data to improve structure prediction.
arXiv Detail & Related papers (2024-10-14T03:11:33Z) - MoleculeCLA: Rethinking Molecular Benchmark via Computational Ligand-Target Binding Analysis [18.940529282539842]
We construct a large-scale and precise molecular representation dataset of approximately 140,000 small molecules.
Our dataset offers significant physicochemical interpretability to guide model development and design.
We believe this dataset will serve as a more accurate and reliable benchmark for molecular representation learning.
arXiv Detail & Related papers (2024-06-13T02:50:23Z) - Unsupervised Learning of Molecular Embeddings for Enhanced Clustering
and Emergent Properties for Chemical Compounds [2.6803933204362336]
We introduce various methods to detect and cluster chemical compounds based on their SMILES data.
Our first method, analyzing the graphical structures of chemical compounds using embedding data, employs vector search to meet our threshold value.
We also used natural language description embeddings stored in a vector database with GPT3.5, which outperforms the base model.
arXiv Detail & Related papers (2023-10-25T18:00:24Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Graph neural networks for the prediction of molecular structure-property
relationships [59.11160990637615]
Graph neural networks (GNNs) are a novel machine learning method that directly work on the molecular graph.
GNNs allow to learn properties in an end-to-end fashion, thereby avoiding the need for informative descriptors.
We describe the fundamentals of GNNs and demonstrate the application of GNNs via two examples for molecular property prediction.
arXiv Detail & Related papers (2022-07-25T11:30:44Z) - Tyger: Task-Type-Generic Active Learning for Molecular Property
Prediction [121.97742787439546]
How to accurately predict the properties of molecules is an essential problem in AI-driven drug discovery.
To reduce annotation cost, deep Active Learning methods are developed to select only the most representative and informative data for annotating.
We propose a Task-type-generic active learning framework (termed Tyger) that is able to handle different types of learning tasks in a unified manner.
arXiv Detail & Related papers (2022-05-23T12:56:12Z) - Do Large Scale Molecular Language Representations Capture Important
Structural Information? [31.76876206167457]
We present molecular embeddings obtained by training an efficient transformer encoder model, referred to as MoLFormer.
Experiments show that the learned molecular representation performs competitively, when compared to graph-based and fingerprint-based supervised learning baselines.
arXiv Detail & Related papers (2021-06-17T14:33:55Z) - Few-Shot Graph Learning for Molecular Property Prediction [46.60746023179724]
We propose Meta-MGNN, a novel model for few-shot molecular property prediction.
To exploit unlabeled molecular information, Meta-MGNN further incorporates molecular structure, attribute based self-supervised modules and self-attentive task weights.
Extensive experiments on two public multi-property datasets demonstrate that Meta-MGNN outperforms a variety of state-of-the-art methods.
arXiv Detail & Related papers (2021-02-16T01:55:34Z) - Advanced Graph and Sequence Neural Networks for Molecular Property
Prediction and Drug Discovery [53.00288162642151]
We develop MoleculeKit, a suite of comprehensive machine learning tools spanning different computational models and molecular representations.
Built on these representations, MoleculeKit includes both deep learning and traditional machine learning methods for graph and sequence data.
Results on both online and offline antibiotics discovery and molecular property prediction tasks show that MoleculeKit achieves consistent improvements over prior methods.
arXiv Detail & Related papers (2020-12-02T02:09:31Z) - Predicting Chemical Properties using Self-Attention Multi-task Learning
based on SMILES Representation [0.0]
In this study, we explore the structural differences of the transformer-variant model and proposed a new self-attention based model.
The representation learning performance of the self-attention module was evaluated in a multi-task learning environment using imbalanced chemical datasets.
arXiv Detail & Related papers (2020-10-19T09:46:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.