ScRAE: Deterministic Regularized Autoencoders with Flexible Priors for
Clustering Single-cell Gene Expression Data
- URL: http://arxiv.org/abs/2107.07709v1
- Date: Fri, 16 Jul 2021 05:13:31 GMT
- Title: ScRAE: Deterministic Regularized Autoencoders with Flexible Priors for
Clustering Single-cell Gene Expression Data
- Authors: Arnab Kumar Mondal, Himanshu Asnani, Parag Singla, Prathosh AP
- Abstract summary: Clustering single-cell RNA sequence (scRNA-seq) data poses statistical and computational challenges.
Regularized Auto-Encoder (RAE) based deep neural network models have achieved remarkable success in learning robust low-dimensional representations.
We propose a modified RAE framework (called the scRAE) for effective clustering of the single-cell RNA sequencing data.
- Score: 11.511172015076532
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Clustering single-cell RNA sequence (scRNA-seq) data poses statistical and
computational challenges due to their high-dimensionality and data-sparsity,
also known as `dropout' events. Recently, Regularized Auto-Encoder (RAE) based
deep neural network models have achieved remarkable success in learning robust
low-dimensional representations. The basic idea in RAEs is to learn a
non-linear mapping from the high-dimensional data space to a low-dimensional
latent space and vice-versa, simultaneously imposing a distributional prior on
the latent space, which brings in a regularization effect. This paper argues
that RAEs suffer from the infamous problem of bias-variance trade-off in their
naive formulation. While a simple AE without a latent regularization results in
data over-fitting, a very strong prior leads to under-representation and thus
bad clustering. To address the above issues, we propose a modified RAE
framework (called the scRAE) for effective clustering of the single-cell RNA
sequencing data. scRAE consists of deterministic AE with a flexibly learnable
prior generator network, which is jointly trained with the AE. This facilitates
scRAE to trade-off better between the bias and variance in the latent space. We
demonstrate the efficacy of the proposed method through extensive
experimentation on several real-world single-cell Gene expression datasets.
Related papers
- ARC: A Generalist Graph Anomaly Detector with In-Context Learning [62.202323209244]
ARC is a generalist GAD approach that enables a one-for-all'' GAD model to detect anomalies across various graph datasets on-the-fly.
equipped with in-context learning, ARC can directly extract dataset-specific patterns from the target dataset.
Extensive experiments on multiple benchmark datasets from various domains demonstrate the superior anomaly detection performance, efficiency, and generalizability of ARC.
arXiv Detail & Related papers (2024-05-27T02:42:33Z) - sc-OTGM: Single-Cell Perturbation Modeling by Solving Optimal Mass Transport on the Manifold of Gaussian Mixtures [0.9674145073701153]
sc-OTGM is an unsupervised model grounded in the inductive bias that the scRNAseq data can be generated.
sc-OTGM is effective in cell state classification, aids in the analysis of differential gene expression, and ranks genes for target identification.
It also predicts the effects of single-gene perturbations on downstream gene regulation and generates synthetic scRNA-seq data conditioned on specific cell states.
arXiv Detail & Related papers (2024-05-06T06:46:11Z) - Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - Differentially private sliced inverse regression in the federated
paradigm [3.539008590223188]
We extend Sliced inverse regression (SIR) to address the challenges of decentralized data, prioritizing privacy and communication efficiency.
Our approach, named as federated sliced inverse regression (FSIR), facilitates collaborative estimation of the sufficient dimension reduction subspace among multiple clients.
arXiv Detail & Related papers (2023-06-10T00:32:39Z) - Momentum Contrastive Autoencoder: Using Contrastive Learning for Latent
Space Distribution Matching in WAE [51.09507030387935]
Wasserstein autoencoder (WAE) shows that matching two distributions is equivalent to minimizing a simple autoencoder (AE) loss under the constraint that the latent space of this AE matches a pre-specified prior distribution.
We propose to use the contrastive learning framework that has been shown to be effective for self-supervised representation learning, as a means to resolve this problem.
We show that using the contrastive learning framework to optimize the WAE loss achieves faster convergence and more stable optimization compared with existing popular algorithms for WAE.
arXiv Detail & Related papers (2021-10-19T22:55:47Z) - Approximate kNN Classification for Biomedical Data [1.1852406625172218]
Single-cell RNA-seq (scRNA-seq) is an emerging DNA sequencing technology with promising capabilities but significant computational challenges.
We propose the utilization of approximate nearest neighbor search algorithms for the task of kNN classification in scRNA-seq data.
arXiv Detail & Related papers (2020-12-03T18:30:43Z) - Statistical control for spatio-temporal MEG/EEG source imaging with
desparsified multi-task Lasso [102.84915019938413]
Non-invasive techniques like magnetoencephalography (MEG) or electroencephalography (EEG) offer promise of non-invasive techniques.
The problem of source localization, or source imaging, poses however a high-dimensional statistical inference challenge.
We propose an ensemble of desparsified multi-task Lasso (ecd-MTLasso) to deal with this problem.
arXiv Detail & Related papers (2020-09-29T21:17:16Z) - CASTLE: Regularization via Auxiliary Causal Graph Discovery [89.74800176981842]
We introduce Causal Structure Learning (CASTLE) regularization and propose to regularize a neural network by jointly learning the causal relationships between variables.
CASTLE efficiently reconstructs only the features in the causal DAG that have a causal neighbor, whereas reconstruction-based regularizers suboptimally reconstruct all input features.
arXiv Detail & Related papers (2020-09-28T09:49:38Z) - TadGAN: Time Series Anomaly Detection Using Generative Adversarial
Networks [73.01104041298031]
TadGAN is an unsupervised anomaly detection approach built on Generative Adversarial Networks (GANs)
To capture the temporal correlations of time series, we use LSTM Recurrent Neural Networks as base models for Generators and Critics.
To demonstrate the performance and generalizability of our approach, we test several anomaly scoring techniques and report the best-suited one.
arXiv Detail & Related papers (2020-09-16T15:52:04Z) - Robust Locality-Aware Regression for Labeled Data Classification [5.432221650286726]
We propose a new discriminant feature extraction framework, namely Robust Locality-Aware Regression (RLAR)
In our model, we introduce a retargeted regression to perform the marginal representation learning adaptively instead of using the general average inter-class margin.
To alleviate the disturbance of outliers and prevent overfitting, we measure the regression term and locality-aware term together with the regularization term by the L2,1 norm.
arXiv Detail & Related papers (2020-06-15T11:36:59Z) - To Regularize or Not To Regularize? The Bias Variance Trade-off in
Regularized AEs [10.611727286504994]
We study the effect of the latent prior on the generation deterministic quality of AE models.
We show that our model, called FlexAE, is the new state-of-the-art for the AE based generative models.
arXiv Detail & Related papers (2020-06-10T14:00:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.