scRDiT: Generating single-cell RNA-seq data by diffusion transformers and accelerating sampling
- URL: http://arxiv.org/abs/2404.06153v1
- Date: Tue, 9 Apr 2024 09:25:16 GMT
- Title: scRDiT: Generating single-cell RNA-seq data by diffusion transformers and accelerating sampling
- Authors: Shengze Dong, Zhuorui Cui, Ding Liu, Jinzhi Lei,
- Abstract summary: Single-cell RNA sequencing (scRNA-seq) is a groundbreaking technology extensively utilized in biological research.
Our study introduces a generative approach termed scRNA-seq Diffusion Transformer (scRDiT)
This method generates virtual scRNA-seq data by leveraging a real dataset.
- Score: 9.013834280011293
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivation: Single-cell RNA sequencing (scRNA-seq) is a groundbreaking technology extensively utilized in biological research, facilitating the examination of gene expression at the individual cell level within a given tissue sample. While numerous tools have been developed for scRNA-seq data analysis, the challenge persists in capturing the distinct features of such data and replicating virtual datasets that share analogous statistical properties. Results: Our study introduces a generative approach termed scRNA-seq Diffusion Transformer (scRDiT). This method generates virtual scRNA-seq data by leveraging a real dataset. The method is a neural network constructed based on Denoising Diffusion Probabilistic Models (DDPMs) and Diffusion Transformers (DiTs). This involves subjecting Gaussian noises to the real dataset through iterative noise-adding steps and ultimately restoring the noises to form scRNA-seq samples. This scheme allows us to learn data features from actual scRNA-seq samples during model training. Our experiments, conducted on two distinct scRNA-seq datasets, demonstrate superior performance. Additionally, the model sampling process is expedited by incorporating Denoising Diffusion Implicit Models (DDIM). scRDiT presents a unified methodology empowering users to train neural network models with their unique scRNA-seq datasets, enabling the generation of numerous high-quality scRNA-seq samples. Availability and implementation: https://github.com/DongShengze/scRDiT
Related papers
- Latent Diffusion Models for Controllable RNA Sequence Generation [33.38594748558547]
RNA is a key intermediary between DNA and protein, exhibiting high sequence diversity and complex three-dimensional structures.
We develop a latent diffusion model for generating and optimizing discrete RNA sequences of variable lengths.
Empirical results confirm that RNAdiffusion generates non-coding RNAs that align with natural distributions across various biological metrics.
arXiv Detail & Related papers (2024-09-15T19:04:50Z) - scDiffusion: conditional generation of high-quality single-cell data
using diffusion model [1.0738561302102216]
Single-cell RNA sequencing (scRNA-seq) data are important for studying the laws of life at single-cell level.
It is still challenging to obtain enough high-quality scRNA-seq data.
We developed scDiffusion, a generative model combining diffusion model and foundation model to generate high-quality scRNA-seq data.
arXiv Detail & Related papers (2024-01-08T15:44:39Z) - scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis
in Brain [46.39828178736219]
We introduce scHyena, a foundation model designed to address these challenges and enhance the accuracy of scRNA-seq analysis in the brain.
scHyena is equipped with a linear adaptor layer, the positional encoding via gene-embedding, and a bidirectional Hyena operator.
This enables us to process full-length scRNA-seq data without losing any information from the raw data.
arXiv Detail & Related papers (2023-10-04T10:30:08Z) - Improving Out-of-Distribution Robustness of Classifiers via Generative
Interpolation [56.620403243640396]
Deep neural networks achieve superior performance for learning from independent and identically distributed (i.i.d.) data.
However, their performance deteriorates significantly when handling out-of-distribution (OoD) data.
We develop a simple yet effective method called Generative Interpolation to fuse generative models trained from multiple domains for synthesizing diverse OoD samples.
arXiv Detail & Related papers (2023-07-23T03:53:53Z) - Fast and Functional Structured Data Generators Rooted in
Out-of-Equilibrium Physics [62.997667081978825]
We address the challenge of using energy-based models to produce high-quality, label-specific data in structured datasets.
Traditional training methods encounter difficulties due to inefficient Markov chain Monte Carlo mixing.
We use a novel training algorithm that exploits non-equilibrium effects.
arXiv Detail & Related papers (2023-07-13T15:08:44Z) - Application of Deep Learning on Single-Cell RNA-sequencing Data
Analysis: A Review [17.976898403296275]
Single-cell RNA-sequencing (scRNA-seq) has become a routinely used technique to quantify the gene expression profile of thousands of single cells simultaneously.
Deep learning, a recent advance of artificial intelligence, has also emerged as a promising tool for scRNA-seq data analysis.
arXiv Detail & Related papers (2022-10-11T17:07:22Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Convolutional generative adversarial imputation networks for
spatio-temporal missing data in storm surge simulations [86.5302150777089]
Generative Adversarial Imputation Nets (GANs) and GAN-based techniques have attracted attention as unsupervised machine learning methods.
We name our proposed method as Con Conval Generative Adversarial Imputation Nets (Conv-GAIN)
arXiv Detail & Related papers (2021-11-03T03:50:48Z) - Deep Transformer Networks for Time Series Classification: The NPP Safety
Case [59.20947681019466]
An advanced temporal neural network referred to as the Transformer is used within a supervised learning fashion to model the time-dependent NPP simulation data.
The Transformer can learn the characteristics of the sequential data and yield promising performance with approximately 99% classification accuracy on the testing dataset.
arXiv Detail & Related papers (2021-04-09T14:26:25Z) - Classification of Long Noncoding RNA Elements Using Deep Convolutional
Neural Networks and Siamese Networks [17.8181080354116]
This thesis proposes a new methodemploying deep convolutional neural networks (CNNs) to classifyncRNA sequences.
As a result, clas-sifying RNA sequences is converted to an image classificationproblem that can be efficiently solved by CNN-basedclassification models.
arXiv Detail & Related papers (2021-02-10T17:26:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.