scDiffusion: conditional generation of high-quality single-cell data
using diffusion model
- URL: http://arxiv.org/abs/2401.03968v2
- Date: Tue, 5 Mar 2024 04:45:14 GMT
- Title: scDiffusion: conditional generation of high-quality single-cell data
using diffusion model
- Authors: Erpai Luo, Minsheng Hao, Lei Wei, Xuegong Zhang
- Abstract summary: Single-cell RNA sequencing (scRNA-seq) data are important for studying the laws of life at single-cell level.
It is still challenging to obtain enough high-quality scRNA-seq data.
We developed scDiffusion, a generative model combining diffusion model and foundation model to generate high-quality scRNA-seq data.
- Score: 1.0738561302102216
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Single-cell RNA sequencing (scRNA-seq) data are important for studying the
laws of life at single-cell level. However, it is still challenging to obtain
enough high-quality scRNA-seq data. To mitigate the limited availability of
data, generative models have been proposed to computationally generate
synthetic scRNA-seq data. Nevertheless, the data generated with current models
are not very realistic yet, especially when we need to generate data with
controlled conditions. In the meantime, the Diffusion models have shown their
power in generating data at high fidelity, providing a new opportunity for
scRNA-seq generation.
In this study, we developed scDiffusion, a generative model combining
diffusion model and foundation model to generate high-quality scRNA-seq data
with controlled conditions. We designed multiple classifiers to guide the
diffusion process simultaneously, enabling scDiffusion to generate data under
multiple condition combinations. We also proposed a new control strategy called
Gradient Interpolation. This strategy allows the model to generate continuous
trajectories of cell development from a given cell state.
Experiments showed that scDiffusion can generate single-cell gene expression
data closely resembling real scRNA-seq data. Also, scDiffusion can
conditionally produce data on specific cell types including rare cell types.
Furthermore, we could use the multiple-condition generation of scDiffusion to
generate cell type that was out of the training data. Leveraging the Gradient
Interpolation strategy, we generated a continuous developmental trajectory of
mouse embryonic cells. These experiments demonstrate that scDiffusion is a
powerful tool for augmenting the real scRNA-seq data and can provide insights
into cell fate research.
Related papers
- Generating Multi-Modal and Multi-Attribute Single-Cell Counts with CFGen [76.02070962797794]
We present Cell Flow for Generation, a flow-based conditional generative model for multi-modal single-cell counts.
Our results suggest improved recovery of crucial biological data characteristics while accounting for novel generative tasks.
arXiv Detail & Related papers (2024-07-16T14:05:03Z) - sc-OTGM: Single-Cell Perturbation Modeling by Solving Optimal Mass Transport on the Manifold of Gaussian Mixtures [0.9674145073701153]
sc-OTGM is an unsupervised model grounded in the inductive bias that the scRNAseq data can be generated.
sc-OTGM is effective in cell state classification, aids in the analysis of differential gene expression, and ranks genes for target identification.
It also predicts the effects of single-gene perturbations on downstream gene regulation and generates synthetic scRNA-seq data conditioned on specific cell states.
arXiv Detail & Related papers (2024-05-06T06:46:11Z) - scRDiT: Generating single-cell RNA-seq data by diffusion transformers and accelerating sampling [9.013834280011293]
Single-cell RNA sequencing (scRNA-seq) is a groundbreaking technology extensively utilized in biological research.
Our study introduces a generative approach termed scRNA-seq Diffusion Transformer (scRDiT)
This method generates virtual scRNA-seq data by leveraging a real dataset.
arXiv Detail & Related papers (2024-04-09T09:25:16Z) - scBiGNN: Bilevel Graph Representation Learning for Cell Type
Classification from Single-cell RNA Sequencing Data [62.87454293046843]
Graph neural networks (GNNs) have been widely used for automatic cell type classification.
scBiGNN comprises two GNN modules to identify cell types.
scBiGNN outperforms a variety of existing methods for cell type classification from scRNA-seq data.
arXiv Detail & Related papers (2023-12-16T03:54:26Z) - scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis
in Brain [46.39828178736219]
We introduce scHyena, a foundation model designed to address these challenges and enhance the accuracy of scRNA-seq analysis in the brain.
scHyena is equipped with a linear adaptor layer, the positional encoding via gene-embedding, and a bidirectional Hyena operator.
This enables us to process full-length scRNA-seq data without losing any information from the raw data.
arXiv Detail & Related papers (2023-10-04T10:30:08Z) - Improving Out-of-Distribution Robustness of Classifiers via Generative
Interpolation [56.620403243640396]
Deep neural networks achieve superior performance for learning from independent and identically distributed (i.i.d.) data.
However, their performance deteriorates significantly when handling out-of-distribution (OoD) data.
We develop a simple yet effective method called Generative Interpolation to fuse generative models trained from multiple domains for synthesizing diverse OoD samples.
arXiv Detail & Related papers (2023-07-23T03:53:53Z) - Fast and Functional Structured Data Generators Rooted in
Out-of-Equilibrium Physics [62.997667081978825]
We address the challenge of using energy-based models to produce high-quality, label-specific data in structured datasets.
Traditional training methods encounter difficulties due to inefficient Markov chain Monte Carlo mixing.
We use a novel training algorithm that exploits non-equilibrium effects.
arXiv Detail & Related papers (2023-07-13T15:08:44Z) - CloudPred: Predicting Patient Phenotypes From Single-cell RNA-seq [6.669618903574761]
Single-cell RNA sequencing (scRNA-seq) has the potential to provide powerful, high-resolution signatures to inform disease prognosis and precision medicine.
This paper develops an interpretable machine learning algorithm, CloudPred, to predict individuals' disease phenotypes from their scRNA-seq data.
arXiv Detail & Related papers (2021-10-13T22:41:30Z) - Conditional Hybrid GAN for Sequence Generation [56.67961004064029]
We propose a novel conditional hybrid GAN (C-Hybrid-GAN) to solve this issue.
We exploit the Gumbel-Softmax technique to approximate the distribution of discrete-valued sequences.
We demonstrate that the proposed C-Hybrid-GAN outperforms the existing methods in context-conditioned discrete-valued sequence generation.
arXiv Detail & Related papers (2020-09-18T03:52:55Z) - Cell Type Identification from Single-Cell Transcriptomic Data via
Semi-supervised Learning [2.4271601178529063]
Cell type identification from single-cell transcriptomic data is a common goal of single-cell RNA sequencing (scRNAseq) data analysis.
We propose a semi-supervised learning model to use unlabeled scRNAseq cells and limited amount of labeled scRNAseq cells to implement cell identification.
It is observed that the proposed model is able to achieve encouraging performance by learning on very limited amount of labeled scRNAseq cells.
arXiv Detail & Related papers (2020-05-06T19:15:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.