scRDiT: Generating single-cell RNA-seq data by diffusion transformers and accelerating sampling
- URL: http://arxiv.org/abs/2404.06153v1
- Date: Tue, 9 Apr 2024 09:25:16 GMT
- Title: scRDiT: Generating single-cell RNA-seq data by diffusion transformers and accelerating sampling
- Authors: Shengze Dong, Zhuorui Cui, Ding Liu, Jinzhi Lei,
- Abstract summary: Single-cell RNA sequencing (scRNA-seq) is a groundbreaking technology extensively utilized in biological research.
Our study introduces a generative approach termed scRNA-seq Diffusion Transformer (scRDiT)
This method generates virtual scRNA-seq data by leveraging a real dataset.
- Score: 9.013834280011293
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivation: Single-cell RNA sequencing (scRNA-seq) is a groundbreaking technology extensively utilized in biological research, facilitating the examination of gene expression at the individual cell level within a given tissue sample. While numerous tools have been developed for scRNA-seq data analysis, the challenge persists in capturing the distinct features of such data and replicating virtual datasets that share analogous statistical properties. Results: Our study introduces a generative approach termed scRNA-seq Diffusion Transformer (scRDiT). This method generates virtual scRNA-seq data by leveraging a real dataset. The method is a neural network constructed based on Denoising Diffusion Probabilistic Models (DDPMs) and Diffusion Transformers (DiTs). This involves subjecting Gaussian noises to the real dataset through iterative noise-adding steps and ultimately restoring the noises to form scRNA-seq samples. This scheme allows us to learn data features from actual scRNA-seq samples during model training. Our experiments, conducted on two distinct scRNA-seq datasets, demonstrate superior performance. Additionally, the model sampling process is expedited by incorporating Denoising Diffusion Implicit Models (DDIM). scRDiT presents a unified methodology empowering users to train neural network models with their unique scRNA-seq datasets, enabling the generation of numerous high-quality scRNA-seq samples. Availability and implementation: https://github.com/DongShengze/scRDiT
Related papers
- White-Box Diffusion Transformer for single-cell RNA-seq generation [9.846966401472802]
We propose a hybrid model based on Diffusion model and White-Box transformer to generate synthetic and biologically plausible scRNA-seq data.
Our White-Box Diffusion Transformer combines the generative capabilities of Diffusion model with the mathematical interpretability of White-Box transformer.
arXiv Detail & Related papers (2024-11-11T08:24:59Z) - Latent Diffusion Models for Controllable RNA Sequence Generation [33.38594748558547]
RNA is a key intermediary between DNA and protein, exhibiting high sequence diversity and complex three-dimensional structures.
We develop a latent diffusion model for generating and optimizing discrete RNA sequences of variable lengths.
Empirical results confirm that RNAdiffusion generates non-coding RNAs that align with natural distributions across various biological metrics.
arXiv Detail & Related papers (2024-09-15T19:04:50Z) - scDiffusion: conditional generation of high-quality single-cell data
using diffusion model [1.0738561302102216]
Single-cell RNA sequencing (scRNA-seq) data are important for studying the laws of life at single-cell level.
It is still challenging to obtain enough high-quality scRNA-seq data.
We developed scDiffusion, a generative model combining diffusion model and foundation model to generate high-quality scRNA-seq data.
arXiv Detail & Related papers (2024-01-08T15:44:39Z) - scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis
in Brain [46.39828178736219]
We introduce scHyena, a foundation model designed to address these challenges and enhance the accuracy of scRNA-seq analysis in the brain.
scHyena is equipped with a linear adaptor layer, the positional encoding via gene-embedding, and a bidirectional Hyena operator.
This enables us to process full-length scRNA-seq data without losing any information from the raw data.
arXiv Detail & Related papers (2023-10-04T10:30:08Z) - Improving Out-of-Distribution Robustness of Classifiers via Generative
Interpolation [56.620403243640396]
Deep neural networks achieve superior performance for learning from independent and identically distributed (i.i.d.) data.
However, their performance deteriorates significantly when handling out-of-distribution (OoD) data.
We develop a simple yet effective method called Generative Interpolation to fuse generative models trained from multiple domains for synthesizing diverse OoD samples.
arXiv Detail & Related papers (2023-07-23T03:53:53Z) - Application of Deep Learning on Single-Cell RNA-sequencing Data
Analysis: A Review [17.976898403296275]
Single-cell RNA-sequencing (scRNA-seq) has become a routinely used technique to quantify the gene expression profile of thousands of single cells simultaneously.
Deep learning, a recent advance of artificial intelligence, has also emerged as a promising tool for scRNA-seq data analysis.
arXiv Detail & Related papers (2022-10-11T17:07:22Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Convolutional generative adversarial imputation networks for
spatio-temporal missing data in storm surge simulations [86.5302150777089]
Generative Adversarial Imputation Nets (GANs) and GAN-based techniques have attracted attention as unsupervised machine learning methods.
We name our proposed method as Con Conval Generative Adversarial Imputation Nets (Conv-GAIN)
arXiv Detail & Related papers (2021-11-03T03:50:48Z) - Deep Transformer Networks for Time Series Classification: The NPP Safety
Case [59.20947681019466]
An advanced temporal neural network referred to as the Transformer is used within a supervised learning fashion to model the time-dependent NPP simulation data.
The Transformer can learn the characteristics of the sequential data and yield promising performance with approximately 99% classification accuracy on the testing dataset.
arXiv Detail & Related papers (2021-04-09T14:26:25Z) - Generating diverse and natural text-to-speech samples using a quantized
fine-grained VAE and auto-regressive prosody prior [53.69310441063162]
This paper proposes a sequential prior in a discrete latent space which can generate more naturally sounding samples.
We evaluate the approach using listening tests, objective metrics of automatic speech recognition (ASR) performance, and measurements of prosody attributes.
arXiv Detail & Related papers (2020-02-06T12:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.