Conditionally Invariant Representation Learning for Disentangling
Cellular Heterogeneity
- URL: http://arxiv.org/abs/2307.00558v1
- Date: Sun, 2 Jul 2023 12:52:41 GMT
- Title: Conditionally Invariant Representation Learning for Disentangling
Cellular Heterogeneity
- Authors: Hananeh Aliee, Ferdinand Kapl, Soroor Hediyeh-Zadeh, Fabian J. Theis
- Abstract summary: This paper presents a novel approach that leverages domain variability to learn representations that are conditionally invariant to unwanted variability or distractors.
We apply our method to grand biological challenges, such as data integration in single-cell genomics.
Specifically, the proposed approach helps to disentangle biological signals from data biases that are unrelated to the target task or the causal explanation of interest.
- Score: 25.488181126364186
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a novel approach that leverages domain variability to
learn representations that are conditionally invariant to unwanted variability
or distractors. Our approach identifies both spurious and invariant latent
features necessary for achieving accurate reconstruction by placing distinct
conditional priors on latent features. The invariant signals are disentangled
from noise by enforcing independence which facilitates the construction of an
interpretable model with a causal semantic. By exploiting the interplay between
data domains and labels, our method simultaneously identifies invariant
features and builds invariant predictors. We apply our method to grand
biological challenges, such as data integration in single-cell genomics with
the aim of capturing biological variations across datasets with many samples,
obtained from different conditions or multiple laboratories. Our approach
allows for the incorporation of specific biological mechanisms, including gene
programs, disease states, or treatment conditions into the data integration
process, bridging the gap between the theoretical assumptions and real
biological applications. Specifically, the proposed approach helps to
disentangle biological signals from data biases that are unrelated to the
target task or the causal explanation of interest. Through extensive
benchmarking using large-scale human hematopoiesis and human lung cancer data,
we validate the superiority of our approach over existing methods and
demonstrate that it can empower deeper insights into cellular heterogeneity and
the identification of disease cell states.
Related papers
- Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View [49.03501451546763]
We identify the importance of implicit correspondences across biological contexts for exploiting domain-invariant pathological composition.
We propose self-adaptive dynamic distillation to secure instance-aware trade-offs across different model constituents.
arXiv Detail & Related papers (2024-07-14T04:41:16Z) - Semantically Rich Local Dataset Generation for Explainable AI in Genomics [0.716879432974126]
Black box deep learning models trained on genomic sequences excel at predicting the outcomes of different gene regulatory mechanisms.
We propose using Genetic Programming to generate datasets by evolving perturbations in sequences that contribute to their semantic diversity.
arXiv Detail & Related papers (2024-07-03T10:31:30Z) - Meta-Learners for Partially-Identified Treatment Effects Across Multiple Environments [67.80453452949303]
Estimating the conditional average treatment effect (CATE) from observational data is relevant for many applications such as personalized medicine.
Here, we focus on the widespread setting where the observational data come from multiple environments.
We propose different model-agnostic learners (so-called meta-learners) to estimate the bounds that can be used in combination with arbitrary machine learning models.
arXiv Detail & Related papers (2024-06-04T16:31:43Z) - Domain adaptation in small-scale and heterogeneous biological datasets [0.0]
We discuss the benefits and challenges of domain adaptation in biological research.
We argue for the incorporation of domain adaptation techniques to the computational biologist's toolkit.
arXiv Detail & Related papers (2024-05-29T16:01:15Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges.
We first present the model that underlies most of current causal approaches to single-cell biology.
We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z) - Learning Causal Representations of Single Cells via Sparse Mechanism
Shift Modeling [3.2435888122704037]
We propose a deep generative model of single-cell gene expression data for which each perturbation is treated as an intervention targeting an unknown, but sparse, subset of latent variables.
We benchmark these methods on simulated single-cell data to evaluate their performance at latent units recovery, causal target identification and out-of-domain generalization.
arXiv Detail & Related papers (2022-11-07T15:47:40Z) - Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation.
GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z) - Encoding Domain Information with Sparse Priors for Inferring Explainable
Latent Variables [2.8935588665357077]
We propose spex-LVM, a factorial latent variable model with sparse priors to encourage the inference of explainable factors.
spex-LVM utilizes existing knowledge of curated biomedical pathways to automatically assign annotated attributes to latent factors.
Evaluations on simulated and real single-cell RNA-seq datasets demonstrate that our model robustly identifies relevant structure in an inherently explainable manner.
arXiv Detail & Related papers (2021-07-08T10:19:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.