Learning Causal Representations of Single Cells via Sparse Mechanism
Shift Modeling
- URL: http://arxiv.org/abs/2211.03553v3
- Date: Wed, 9 Nov 2022 22:04:16 GMT
- Title: Learning Causal Representations of Single Cells via Sparse Mechanism
Shift Modeling
- Authors: Romain Lopez, Nata\v{s}a Tagasovska, Stephen Ra, Kyunghyn Cho,
Jonathan K. Pritchard, Aviv Regev
- Abstract summary: We propose a deep generative model of single-cell gene expression data for which each perturbation is treated as an intervention targeting an unknown, but sparse, subset of latent variables.
We benchmark these methods on simulated single-cell data to evaluate their performance at latent units recovery, causal target identification and out-of-domain generalization.
- Score: 3.2435888122704037
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Latent variable models such as the Variational Auto-Encoder (VAE) have become
a go-to tool for analyzing biological data, especially in the field of
single-cell genomics. One remaining challenge is the interpretability of latent
variables as biological processes that define a cell's identity. Outside of
biological applications, this problem is commonly referred to as learning
disentangled representations. Although several disentanglement-promoting
variants of the VAE were introduced, and applied to single-cell genomics data,
this task has been shown to be infeasible from independent and identically
distributed measurements, without additional structure. Instead, recent methods
propose to leverage non-stationary data, as well as the sparse mechanism shift
assumption in order to learn disentangled representations with a causal
semantic. Here, we extend the application of these methodological advances to
the analysis of single-cell genomics data with genetic or chemical
perturbations. More precisely, we propose a deep generative model of
single-cell gene expression data for which each perturbation is treated as a
stochastic intervention targeting an unknown, but sparse, subset of latent
variables. We benchmark these methods on simulated single-cell data to evaluate
their performance at latent units recovery, causal target identification and
out-of-domain generalization. Finally, we apply those approaches to two
real-world large-scale gene perturbation data sets and find that models that
exploit the sparse mechanism shift hypothesis surpass contemporary methods on a
transfer learning task. We implement our new model and benchmarks using the
scvi-tools library, and release it as open-source software at
https://github.com/Genentech/sVAE.
Related papers
- Generating Multi-Modal and Multi-Attribute Single-Cell Counts with CFGen [76.02070962797794]
We present Cell Flow for Generation, a flow-based conditional generative model for multi-modal single-cell counts.
Our results suggest improved recovery of crucial biological data characteristics while accounting for novel generative tasks.
arXiv Detail & Related papers (2024-07-16T14:05:03Z) - Semantically Rich Local Dataset Generation for Explainable AI in Genomics [0.716879432974126]
Black box deep learning models trained on genomic sequences excel at predicting the outcomes of different gene regulatory mechanisms.
We propose using Genetic Programming to generate datasets by evolving perturbations in sequences that contribute to their semantic diversity.
arXiv Detail & Related papers (2024-07-03T10:31:30Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Modelling Cellular Perturbations with the Sparse Additive Mechanism
Shift Variational Autoencoder [6.352775857356592]
We propose the Sparse Additive Mechanism Shift Variational Autoencoder, SAMS-VAE, to combine compositionality, disentanglement, and interpretability for perturbation models.
SAMS-VAE models the latent state of a perturbed sample as the sum of a local latent variable capturing sample-specific variation and sparse global variables of latent intervention effects.
We evaluate SAMS-VAE both and qualitatively on a range of tasks using two popular single cell sequencing datasets.
arXiv Detail & Related papers (2023-11-05T23:37:31Z) - Mixed Models with Multiple Instance Learning [51.440557223100164]
We introduce MixMIL, a framework integrating Generalized Linear Mixed Models (GLMM) and Multiple Instance Learning (MIL)
Our empirical results reveal that MixMIL outperforms existing MIL models in single-cell datasets.
arXiv Detail & Related papers (2023-11-04T16:42:42Z) - Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges.
We first present the model that underlies most of current causal approaches to single-cell biology.
We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z) - Conditionally Invariant Representation Learning for Disentangling
Cellular Heterogeneity [25.488181126364186]
This paper presents a novel approach that leverages domain variability to learn representations that are conditionally invariant to unwanted variability or distractors.
We apply our method to grand biological challenges, such as data integration in single-cell genomics.
Specifically, the proposed approach helps to disentangle biological signals from data biases that are unrelated to the target task or the causal explanation of interest.
arXiv Detail & Related papers (2023-07-02T12:52:41Z) - A biology-driven deep generative model for cell-type annotation in
cytometry [0.0]
We introduce Scyan, a Single-cell Cytometry Network that automatically annotates cell types using only prior expert knowledge.
Scyan significantly outperforms the related state-of-the-art models on multiple public datasets while being faster and interpretable.
In addition, Scyan overcomes several complementary tasks such as batch-effect removal, debarcoding, and population discovery.
arXiv Detail & Related papers (2022-08-11T10:50:44Z) - Inference of cell dynamics on perturbation data using adjoint
sensitivity [4.606583317143614]
Data-driven dynamic models of cell biology can be used to predict cell response to unseen perturbations.
Recent work had demonstrated the derivation of interpretable models with explicit interaction terms.
This work aims to extend the range of applicability of this model inference approach to a diversity of biological systems.
arXiv Detail & Related papers (2021-04-13T19:15:56Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.