Removing Biases from Molecular Representations via Information
Maximization
- URL: http://arxiv.org/abs/2312.00718v1
- Date: Fri, 1 Dec 2023 16:53:15 GMT
- Title: Removing Biases from Molecular Representations via Information
Maximization
- Authors: Chenyu Wang, Sharut Gupta, Caroline Uhler, Tommi Jaakkola
- Abstract summary: InfoCORE is an Information approach for COnfounder REmoval to deal with batch effects.
It adaptively reweighs samples to equalize their implied batch distribution.
It offers a versatile framework and resolves general distribution shifts and issues of data fairness.
- Score: 16.38589836748167
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-throughput drug screening -- using cell imaging or gene expression
measurements as readouts of drug effect -- is a critical tool in biotechnology
to assess and understand the relationship between the chemical structure and
biological activity of a drug. Since large-scale screens have to be divided
into multiple experiments, a key difficulty is dealing with batch effects,
which can introduce systematic errors and non-biological associations in the
data. We propose InfoCORE, an Information maximization approach for COnfounder
REmoval, to effectively deal with batch effects and obtain refined molecular
representations. InfoCORE establishes a variational lower bound on the
conditional mutual information of the latent representations given a batch
identifier. It adaptively reweighs samples to equalize their implied batch
distribution. Extensive experiments on drug screening data reveal InfoCORE's
superior performance in a multitude of tasks including molecular property
prediction and molecule-phenotype retrieval. Additionally, we show results for
how InfoCORE offers a versatile framework and resolves general distribution
shifts and issues of data fairness by minimizing correlation with spurious
features or removing sensitive attributes. The code is available at
https://github.com/uhlerlab/InfoCORE.
Related papers
- Mixed Effects Deep Learning for the interpretable analysis of single cell RNA sequencing data by quantifying and visualizing batch effects [6.596656267996196]
Single-cell RNA sequencing (scRNA-seq) data are often confounded by technical or biological batch effects.
Existing deep learning models mitigate these effects but often discard batch-specific information.
We propose a Mixed Effects Deep Learning (MEDL) autoencoder framework that separately models batch-invariant (fixed effects) and batch-specific (random effects) components.
arXiv Detail & Related papers (2024-11-11T00:10:48Z) - Unlocking Potential Binders: Multimodal Pretraining DEL-Fusion for Denoising DNA-Encoded Libraries [51.72836644350993]
Multimodal Pretraining DEL-Fusion model (MPDF)
We develop pretraining tasks applying contrastive objectives between different compound representations and their text descriptions.
We propose a novel DEL-fusion framework that amalgamates compound information at the atomic, submolecular, and molecular levels.
arXiv Detail & Related papers (2024-09-07T17:32:21Z) - Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks.
It is quite beneficial and challenging to detect poisoned samples from a mixed dataset.
We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z) - Regressor-free Molecule Generation to Support Drug Response Prediction [83.25894107956735]
Conditional generation based on the target IC50 score can obtain a more effective sampling space.
Regressor-free guidance combines a diffusion model's score estimation with a regression controller model's gradient based on number labels.
arXiv Detail & Related papers (2024-05-23T13:22:17Z) - Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method.
HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution.
Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z) - Mitigating Molecular Aggregation in Drug Discovery with Predictive
Insights from Explainable AI [1.0499611180329804]
A lack of understanding of the causes of molecular aggregation introduces difficulty in the development of predictive tools for detecting aggregating molecules.
We present an examination of the molecular features differentiating datasets of aggregating and non-aggregating molecules, as well as a machine learning approach to predicting molecular aggregation.
Our method uses explainable graph neural networks and counterfactuals to reliably predict and explain aggregation, giving additional insights and design rules for future screening.
arXiv Detail & Related papers (2023-06-03T22:30:45Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - Modular multi-source prediction of drug side-effects with DruGNN [3.229607826010618]
Drug Side-Effects (DSEs) have a high impact on public health, care system costs, and drug discovery processes.
To predict their occurrence, it is necessary to integrate data from heterogeneous sources.
In this work, such heterogeneous data is integrated into a graph dataset, expressively representing the relational information between different entities.
Graph Neural Networks (GNNs) are exploited to predict DSEs on our dataset with very promising results.
arXiv Detail & Related papers (2022-02-15T09:41:05Z) - Improving VAE based molecular representations for compound property
prediction [0.0]
We propose a simple method to improve chemical property prediction performance of machine learning models.
We show the relation between the performance of property prediction models and the distance between property prediction dataset and the larger unlabeled dataset.
arXiv Detail & Related papers (2022-01-13T12:57:11Z) - MOOMIN: Deep Molecular Omics Network for Anti-Cancer Drug Combination
Therapy [2.446672595462589]
We propose a multimodal graph neural network that can predict the synergistic effect of drug combinations for cancer treatment.
Our model captures the representation based on the context of drugs at multiple scales based on a drug-protein interaction network and metadata.
We demonstrate that the model makes high-quality predictions over a wide range of cancer cell line tissues.
arXiv Detail & Related papers (2021-10-28T13:10:25Z) - MolTrans: Molecular Interaction Transformer for Drug Target Interaction
Prediction [68.5766865583049]
Drug target interaction (DTI) prediction is a foundational task for in silico drug discovery.
Recent years have witnessed promising progress for deep learning in DTI predictions.
We propose a Molecular Interaction Transformer (TransMol) to address these limitations.
arXiv Detail & Related papers (2020-04-23T18:56:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.