SimCD: Simultaneous Clustering and Differential expression analysis for
single-cell transcriptomic data
- URL: http://arxiv.org/abs/2104.01512v1
- Date: Sun, 4 Apr 2021 01:06:18 GMT
- Title: SimCD: Simultaneous Clustering and Differential expression analysis for
single-cell transcriptomic data
- Authors: Seyednami Niyakan, Ehsan Hajiramezanali, Shahin Boluki, Siamak Zamani
Dadaneh, Xiaoning Qian
- Abstract summary: Single-Cell RNA sequencing (scRNA-seq) has facilitated genome-scale transcriptomic profiling of individual cells.
Several scRNA-seq analysis methods have been proposed to first identify cell sub-populations by clustering and then separately perform differential expression analysis to understand gene expression changes.
We develop a new method -- SimCD -- that explicitly models cell heterogeneity and dynamic differential changes in one unified hierarchical gamma-negative binomial model.
- Score: 22.702909270039314
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Single-Cell RNA sequencing (scRNA-seq) measurements have facilitated
genome-scale transcriptomic profiling of individual cells, with the hope of
deconvolving cellular dynamic changes in corresponding cell sub-populations to
better understand molecular mechanisms of different development processes.
Several scRNA-seq analysis methods have been proposed to first identify cell
sub-populations by clustering and then separately perform differential
expression analysis to understand gene expression changes. Their corresponding
statistical models and inference algorithms are often designed disjointly. We
develop a new method -- SimCD -- that explicitly models cell heterogeneity and
dynamic differential changes in one unified hierarchical gamma-negative
binomial (hGNB) model, allowing simultaneous cell clustering and differential
expression analysis for scRNA-seq data. Our method naturally defines cell
heterogeneity by dynamic expression changes, which is expected to help achieve
better performances on the two tasks compared to the existing methods that
perform them separately. In addition, SimCD better models dropout (zero
inflation) in scRNA-seq data by both cell- and gene-level factors and obviates
the need for sophisticated pre-processing steps such as normalization, thanks
to the direct modeling of scRNA-seq count data by the rigorous hGNB model with
an efficient Gibbs sampling inference algorithm. Extensive comparisons with the
state-of-the-art methods on both simulated and real-world scRNA-seq count data
demonstrate the capability of SimCD to discover cell clusters and capture
dynamic expression changes. Furthermore, SimCD helps identify several known
genes affected by food deprivation in hypothalamic neuron cell subtypes as well
as some new potential markers, suggesting the capability of SimCD for
bio-marker discovery.
Related papers
- A scalable gene network model of regulatory dynamics in single cells [88.48246132084441]
We introduce a Functional Learnable model of Cell dynamicS, FLeCS, that incorporates gene network structure into coupled differential equations to model gene regulatory functions.
Given (pseudo)time-series single-cell data, FLeCS accurately infers cell dynamics at scale.
arXiv Detail & Related papers (2025-03-25T19:19:21Z) - Stochastic gradient descent estimation of generalized matrix factorization models with application to single-cell RNA sequencing data [41.94295877935867]
Single-cell RNA sequencing allows the quantitation of gene expression at the individual cell level.
Dimensionality reduction is a common preprocessing step to simplify the visualization, clustering, and phenotypic characterization of samples.
We present a generalized matrix factorization model assuming a general exponential dispersion family distribution.
We propose a scalable adaptive descent algorithm that allows us to estimate the model efficiently.
arXiv Detail & Related papers (2024-12-29T16:02:15Z) - Generating Multi-Modal and Multi-Attribute Single-Cell Counts with CFGen [76.02070962797794]
We present Cell Flow for Generation, a flow-based conditional generative model for multi-modal single-cell counts.
Our results suggest improved recovery of crucial biological data characteristics while accounting for novel generative tasks.
arXiv Detail & Related papers (2024-07-16T14:05:03Z) - Semantically Rich Local Dataset Generation for Explainable AI in Genomics [0.716879432974126]
Black box deep learning models trained on genomic sequences excel at predicting the outcomes of different gene regulatory mechanisms.
We propose using Genetic Programming to generate datasets by evolving perturbations in sequences that contribute to their semantic diversity.
arXiv Detail & Related papers (2024-07-03T10:31:30Z) - Scalable Amortized GPLVMs for Single Cell Transcriptomics Data [9.010523724015398]
Dimensionality reduction is crucial for analyzing large-scale single-cell RNA-seq data.
We introduce an improved model, the amortized variational model (BGPLVM)
BGPLVM is tailored for single-cell RNA-seq with specialized encoder, kernel, and likelihood designs.
arXiv Detail & Related papers (2024-05-06T21:54:38Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - scBiGNN: Bilevel Graph Representation Learning for Cell Type
Classification from Single-cell RNA Sequencing Data [62.87454293046843]
Graph neural networks (GNNs) have been widely used for automatic cell type classification.
scBiGNN comprises two GNN modules to identify cell types.
scBiGNN outperforms a variety of existing methods for cell type classification from scRNA-seq data.
arXiv Detail & Related papers (2023-12-16T03:54:26Z) - Mixed Models with Multiple Instance Learning [51.440557223100164]
We introduce MixMIL, a framework integrating Generalized Linear Mixed Models (GLMM) and Multiple Instance Learning (MIL)
Our empirical results reveal that MixMIL outperforms existing MIL models in single-cell datasets.
arXiv Detail & Related papers (2023-11-04T16:42:42Z) - Analyzing scRNA-seq data by CCP-assisted UMAP and t-SNE [0.0]
Correlated clustering and projection (CCP) was introduced as an effective method for preprocessing scRNA-seq data.
CCP is a data-domain approach that does not require matrix diagonalization.
By using eight publicly available datasets, we have found that CCP significantly improves UMAP and t-SNE visualization.
arXiv Detail & Related papers (2023-06-23T19:15:43Z) - Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data.
In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z) - Learning Causal Representations of Single Cells via Sparse Mechanism
Shift Modeling [3.2435888122704037]
We propose a deep generative model of single-cell gene expression data for which each perturbation is treated as an intervention targeting an unknown, but sparse, subset of latent variables.
We benchmark these methods on simulated single-cell data to evaluate their performance at latent units recovery, causal target identification and out-of-domain generalization.
arXiv Detail & Related papers (2022-11-07T15:47:40Z) - Granger causal inference on DAGs identifies genomic loci regulating
transcription [77.58911272503771]
GrID-Net is a framework based on graph neural networks with lagged message passing for Granger causal inference on DAG-structured systems.
Our application is the analysis of single-cell multimodal data to identify genomic loci that mediate the regulation of specific genes.
arXiv Detail & Related papers (2022-10-18T21:15:10Z) - Topological Data Analysis in Time Series: Temporal Filtration and
Application to Single-Cell Genomics [13.173307471333619]
We propose the single-cell topological simplicial analysis (scTSA)
Applying this approach to the single-cell gene expression profiles from local networks of cells reveals a previously unseen topology of cellular ecology.
Benchmarked on the single-cell RNA-seq data of zebrafish embryogenesis spanning 38,731 cells, 25 cell types and 12 time steps, our approach highlights the gastrulation as the most critical stage.
arXiv Detail & Related papers (2022-04-29T12:46:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.