Related papers: Removing Biases from Molecular Representations via Information Maximization

Removing Biases from Molecular Representations via Information Maximization

URL: http://arxiv.org/abs/2312.00718v1
Date: Fri, 1 Dec 2023 16:53:15 GMT
Title: Removing Biases from Molecular Representations via Information Maximization
Authors: Chenyu Wang, Sharut Gupta, Caroline Uhler, Tommi Jaakkola
Abstract summary: InfoCORE is an Information approach for COnfounder REmoval to deal with batch effects. It adaptively reweighs samples to equalize their implied batch distribution. It offers a versatile framework and resolves general distribution shifts and issues of data fairness.
Score: 16.38589836748167
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: High-throughput drug screening -- using cell imaging or gene expression measurements as readouts of drug effect -- is a critical tool in biotechnology to assess and understand the relationship between the chemical structure and biological activity of a drug. Since large-scale screens have to be divided into multiple experiments, a key difficulty is dealing with batch effects, which can introduce systematic errors and non-biological associations in the data. We propose InfoCORE, an Information maximization approach for COnfounder REmoval, to effectively deal with batch effects and obtain refined molecular representations. InfoCORE establishes a variational lower bound on the conditional mutual information of the latent representations given a batch identifier. It adaptively reweighs samples to equalize their implied batch distribution. Extensive experiments on drug screening data reveal InfoCORE's superior performance in a multitude of tasks including molecular property prediction and molecule-phenotype retrieval. Additionally, we show results for how InfoCORE offers a versatile framework and resolves general distribution shifts and issues of data fairness by minimizing correlation with spurious features or removing sensitive attributes. The code is available at https://github.com/uhlerlab/InfoCORE.

Related papers

Addressing Model Overcomplexity in Drug-Drug Interaction Prediction With Molecular Fingerprints [0.0]
Accurately predicting drug-drug interactions (DDIs) is crucial for pharmaceutical research and clinical safety. Recent deep learning models often suffer from high computational costs and limited generalization across datasets. In this study, we investigate a simpler yet effective approach using molecular representations such as Morgan fingerprints (S), graph-based embeddings from graph convolutional networks (GCNs), and transformer-derived embeddings from MoLFormer integrated into a straightforward neural network.
arXiv Detail & Related papers (2025-03-30T18:27:01Z)
Mixed Effects Deep Learning for the interpretable analysis of single cell RNA sequencing data by quantifying and visualizing batch effects [6.596656267996196]
Single-cell RNA sequencing (scRNA-seq) data are often confounded by technical or biological batch effects. Existing deep learning models mitigate these effects but often discard batch-specific information. We propose a Mixed Effects Deep Learning (MEDL) autoencoder framework that separately models batch-invariant (fixed effects) and batch-specific (random effects) components.
arXiv Detail & Related papers (2024-11-11T00:10:48Z)
Unlocking Potential Binders: Multimodal Pretraining DEL-Fusion for Denoising DNA-Encoded Libraries [51.72836644350993]
Multimodal Pretraining DEL-Fusion model (MPDF) We develop pretraining tasks applying contrastive objectives between different compound representations and their text descriptions. We propose a novel DEL-fusion framework that amalgamates compound information at the atomic, submolecular, and molecular levels.
arXiv Detail & Related papers (2024-09-07T17:32:21Z)
Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks. It is quite beneficial and challenging to detect poisoned samples from a mixed dataset. We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z)
Regressor-free Molecule Generation to Support Drug Response Prediction [83.25894107956735]
Conditional generation based on the target IC50 score can obtain a more effective sampling space. Regressor-free guidance combines a diffusion model's score estimation with a regression controller model's gradient based on number labels.
arXiv Detail & Related papers (2024-05-23T13:22:17Z)
Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method. HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution. Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z)
Mitigating Molecular Aggregation in Drug Discovery with Predictive Insights from Explainable AI [1.0499611180329804]
A lack of understanding of the causes of molecular aggregation introduces difficulty in the development of predictive tools for detecting aggregating molecules. We present an examination of the molecular features differentiating datasets of aggregating and non-aggregating molecules, as well as a machine learning approach to predicting molecular aggregation. Our method uses explainable graph neural networks and counterfactuals to reliably predict and explain aggregation, giving additional insights and design rules for future screening.
arXiv Detail & Related papers (2023-06-03T22:30:45Z)
Drug Synergistic Combinations Predictions via Large-Scale Pre-Training and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation. Deep learning models have emerged as an efficient way to discover synergistic combinations. Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z)
Modular multi-source prediction of drug side-effects with DruGNN [3.229607826010618]
Drug Side-Effects (DSEs) have a high impact on public health, care system costs, and drug discovery processes. To predict their occurrence, it is necessary to integrate data from heterogeneous sources. In this work, such heterogeneous data is integrated into a graph dataset, expressively representing the relational information between different entities. Graph Neural Networks (GNNs) are exploited to predict DSEs on our dataset with very promising results.
arXiv Detail & Related papers (2022-02-15T09:41:05Z)
Improving VAE based molecular representations for compound property prediction [0.0]
We propose a simple method to improve chemical property prediction performance of machine learning models. We show the relation between the performance of property prediction models and the distance between property prediction dataset and the larger unlabeled dataset.
arXiv Detail & Related papers (2022-01-13T12:57:11Z)
MOOMIN: Deep Molecular Omics Network for Anti-Cancer Drug Combination Therapy [2.446672595462589]
We propose a multimodal graph neural network that can predict the synergistic effect of drug combinations for cancer treatment. Our model captures the representation based on the context of drugs at multiple scales based on a drug-protein interaction network and metadata. We demonstrate that the model makes high-quality predictions over a wide range of cancer cell line tissues.
arXiv Detail & Related papers (2021-10-28T13:10:25Z)
MolTrans: Molecular Interaction Transformer for Drug Target Interaction Prediction [68.5766865583049]
Drug target interaction (DTI) prediction is a foundational task for in silico drug discovery. Recent years have witnessed promising progress for deep learning in DTI predictions. We propose a Molecular Interaction Transformer (TransMol) to address these limitations.
arXiv Detail & Related papers (2020-04-23T18:56:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.