Objective-Agnostic Enhancement of Molecule Properties via Multi-Stage
VAE
- URL: http://arxiv.org/abs/2308.13066v2
- Date: Sun, 10 Sep 2023 00:52:57 GMT
- Title: Objective-Agnostic Enhancement of Molecule Properties via Multi-Stage
VAE
- Authors: Chenghui Zhou, Barnabas Poczos
- Abstract summary: Variational autoencoder (VAE) is a popular method for drug discovery and various architectures and pipelines have been proposed to improve its performance.
VAE approaches are known to suffer from poor manifold recovery when the data lie on a low-dimensional manifold embedded in a higher dimensional ambient space.
In this paper, we explore applying a multi-stage VAE approach, that can improve manifold recovery on a synthetic dataset, to the field of drug discovery.
- Score: 1.3597551064547502
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Variational autoencoder (VAE) is a popular method for drug discovery and
various architectures and pipelines have been proposed to improve its
performance. However, VAE approaches are known to suffer from poor manifold
recovery when the data lie on a low-dimensional manifold embedded in a higher
dimensional ambient space [Dai and Wipf, 2019]. The consequences of it in drug
discovery are somewhat under-explored. In this paper, we explore applying a
multi-stage VAE approach, that can improve manifold recovery on a synthetic
dataset, to the field of drug discovery. We experimentally evaluate our
multi-stage VAE approach using the ChEMBL dataset and demonstrate its ability
to improve the property statistics of generated molecules substantially from
pre-existing methods without incorporating property predictors into the
training pipeline. We further fine-tune our models on two curated and much
smaller molecule datasets that target different proteins. Our experiments show
an increase in the number of active molecules generated by the multi-stage VAE
in comparison to their one-stage equivalent. For each of the two tasks, our
baselines include methods that use learned property predictors to incorporate
target metrics directly into the training objective and we discuss
complications that arise with this methodology.
Related papers
- Pullback Flow Matching on Data Manifolds [10.187244125099479]
Pullback Flow Matching (PFM) is a framework for generative modeling on data manifold.
We demonstrate PFM's effectiveness through applications in synthetic, data dynamics and protein sequence data, generating novel proteins with specific properties.
This method shows strong potential for drug discovery and materials science, where generating novel samples with specific properties is of great interest.
arXiv Detail & Related papers (2024-10-06T16:41:26Z) - Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark [101.23684938489413]
Anomaly detection (AD) is often focused on detecting anomalies for industrial quality inspection and medical lesion examination.
This work first constructs a large-scale and general-purpose COCO-AD dataset by extending COCO to the AD field.
Inspired by the metrics in the segmentation field, we propose several more practical threshold-dependent AD-specific metrics.
arXiv Detail & Related papers (2024-04-16T17:38:26Z) - Learning Invariant Molecular Representation in Latent Discrete Space [52.13724532622099]
We propose a new framework for learning molecular representations that exhibit invariance and robustness against distribution shifts.
Our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts.
arXiv Detail & Related papers (2023-10-22T04:06:44Z) - Improving Molecular Properties Prediction Through Latent Space Fusion [9.912768918657354]
We present a multi-view approach that combines latent spaces derived from state-of-the-art chemical models.
Our approach relies on two pivotal elements: the embeddings derived from MHG-GNN, which represent molecular structures as graphs, and MoLFormer embeddings rooted in chemical language.
We demonstrate the superior performance of our proposed multi-view approach compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2023-10-20T20:29:32Z) - Machine Learning Small Molecule Properties in Drug Discovery [44.62264781248437]
We review a wide range of properties, including binding affinities, solubility, and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity)
We discuss existing popular descriptors and embeddings, such as chemical fingerprints and graph-based neural networks.
Finally, techniques to provide an understanding of model predictions, especially for critical decision-making in drug discovery are assessed.
arXiv Detail & Related papers (2023-08-02T22:18:41Z) - Learning Better with Less: Effective Augmentation for Sample-Efficient
Visual Reinforcement Learning [57.83232242068982]
Data augmentation (DA) is a crucial technique for enhancing the sample efficiency of visual reinforcement learning (RL) algorithms.
It remains unclear which attributes of DA account for its effectiveness in achieving sample-efficient visual RL.
This work conducts comprehensive experiments to assess the impact of DA's attributes on its efficacy.
arXiv Detail & Related papers (2023-05-25T15:46:20Z) - ALMERIA: Boosting pairwise molecular contrasts with scalable methods [0.0]
ALMERIA is a tool for estimating compound similarities and activity prediction based on pairwise molecular contrasts.
It has been implemented using scalable software and methods to exploit large volumes of data.
Experiments show state-of-the-art performance for molecular activity prediction.
arXiv Detail & Related papers (2023-04-28T16:27:06Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - Improving Molecule Properties Through 2-Stage VAE [0.0]
Variational autoencoder (VAE) is a popular method for drug discovery.
VAE model suffers from deficiencies such as poor manifold recovery when data lie on low-dimensional manifold embedded in higher dimensional ambient space.
In this paper, we study how to improve the similarity of the data generated via VAE and the training dataset by improving manifold recovery via a 2-stage VAE.
arXiv Detail & Related papers (2022-12-06T04:41:48Z) - DecAug: Augmenting HOI Detection via Decomposition [54.65572599920679]
Current algorithms suffer from insufficient training samples and category imbalance within datasets.
We propose an efficient and effective data augmentation method called DecAug for HOI detection.
Experiments show that our method brings up to 3.3 mAP and 1.6 mAP improvements on V-COCO and HICODET dataset.
arXiv Detail & Related papers (2020-10-02T13:59:05Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.