Related papers: scRDiT: Generating single-cell RNA-seq data by diffusion transformers and accelerating sampling

scRDiT: Generating single-cell RNA-seq data by diffusion transformers and accelerating sampling

URL: http://arxiv.org/abs/2404.06153v1
Date: Tue, 9 Apr 2024 09:25:16 GMT
Title: scRDiT: Generating single-cell RNA-seq data by diffusion transformers and accelerating sampling
Authors: Shengze Dong, Zhuorui Cui, Ding Liu, Jinzhi Lei,
Abstract summary: Single-cell RNA sequencing (scRNA-seq) is a groundbreaking technology extensively utilized in biological research. Our study introduces a generative approach termed scRNA-seq Diffusion Transformer (scRDiT) This method generates virtual scRNA-seq data by leveraging a real dataset.
Score: 9.013834280011293
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Motivation: Single-cell RNA sequencing (scRNA-seq) is a groundbreaking technology extensively utilized in biological research, facilitating the examination of gene expression at the individual cell level within a given tissue sample. While numerous tools have been developed for scRNA-seq data analysis, the challenge persists in capturing the distinct features of such data and replicating virtual datasets that share analogous statistical properties. Results: Our study introduces a generative approach termed scRNA-seq Diffusion Transformer (scRDiT). This method generates virtual scRNA-seq data by leveraging a real dataset. The method is a neural network constructed based on Denoising Diffusion Probabilistic Models (DDPMs) and Diffusion Transformers (DiTs). This involves subjecting Gaussian noises to the real dataset through iterative noise-adding steps and ultimately restoring the noises to form scRNA-seq samples. This scheme allows us to learn data features from actual scRNA-seq samples during model training. Our experiments, conducted on two distinct scRNA-seq datasets, demonstrate superior performance. Additionally, the model sampling process is expedited by incorporating Denoising Diffusion Implicit Models (DDIM). scRDiT presents a unified methodology empowering users to train neural network models with their unique scRNA-seq datasets, enabling the generation of numerous high-quality scRNA-seq samples. Availability and implementation: https://github.com/DongShengze/scRDiT

Related papers

scDD: Latent Codes Based scRNA-seq Dataset Distillation with Foundation Model Knowledge [14.12713117447183]
Single-cell RNA sequencing (scRNA-seq) has profiled hundreds of millions of human cells across organs, diseases, development and perturbations to date. High-dimensional sparsity, batch effect noise, category imbalance, and ever-increasing data scale pose challenges for multi-center knowledge transfer, data fusion, and cross-validation. We propose a latent codes-based scRNA-seq dataset distillation framework named scDD, which distills foundation model knowledge and original dataset information into a compact latent space. We also propose a single-step conditional diffusion generator named SCDG, which perform single-step
arXiv Detail & Related papers (2025-03-06T12:01:20Z)
White-Box Diffusion Transformer for single-cell RNA-seq generation [9.846966401472802]
We propose a hybrid model based on Diffusion model and White-Box transformer to generate synthetic and biologically plausible scRNA-seq data. Our White-Box Diffusion Transformer combines the generative capabilities of Diffusion model with the mathematical interpretability of White-Box transformer.
arXiv Detail & Related papers (2024-11-11T08:24:59Z)
Latent Diffusion Models for Controllable RNA Sequence Generation [33.38594748558547]
RNA is a key intermediary between DNA and protein, exhibiting high sequence diversity and complex three-dimensional structures. We develop a latent diffusion model for generating and optimizing discrete RNA sequences of variable lengths. Empirical results confirm that RNAdiffusion generates non-coding RNAs that align with natural distributions across various biological metrics.
arXiv Detail & Related papers (2024-09-15T19:04:50Z)
scDiffusion: conditional generation of high-quality single-cell data using diffusion model [1.0738561302102216]
Single-cell RNA sequencing (scRNA-seq) data are important for studying the laws of life at single-cell level. It is still challenging to obtain enough high-quality scRNA-seq data. We developed scDiffusion, a generative model combining diffusion model and foundation model to generate high-quality scRNA-seq data.
arXiv Detail & Related papers (2024-01-08T15:44:39Z)
scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis in Brain [46.39828178736219]
We introduce scHyena, a foundation model designed to address these challenges and enhance the accuracy of scRNA-seq analysis in the brain. scHyena is equipped with a linear adaptor layer, the positional encoding via gene-embedding, and a bidirectional Hyena operator. This enables us to process full-length scRNA-seq data without losing any information from the raw data.
arXiv Detail & Related papers (2023-10-04T10:30:08Z)
Improving Out-of-Distribution Robustness of Classifiers via Generative Interpolation [56.620403243640396]
Deep neural networks achieve superior performance for learning from independent and identically distributed (i.i.d.) data. However, their performance deteriorates significantly when handling out-of-distribution (OoD) data. We develop a simple yet effective method called Generative Interpolation to fuse generative models trained from multiple domains for synthesizing diverse OoD samples.
arXiv Detail & Related papers (2023-07-23T03:53:53Z)
Score-based Diffusion Models in Function Space [137.70916238028306]
Diffusion models have recently emerged as a powerful framework for generative modeling. This work introduces a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space. We show that the corresponding discretized algorithm generates accurate samples at a fixed cost independent of the data resolution.
arXiv Detail & Related papers (2023-02-14T23:50:53Z)
Application of Deep Learning on Single-Cell RNA-sequencing Data Analysis: A Review [17.976898403296275]
Single-cell RNA-sequencing (scRNA-seq) has become a routinely used technique to quantify the gene expression profile of thousands of single cells simultaneously. Deep learning, a recent advance of artificial intelligence, has also emerged as a promising tool for scRNA-seq data analysis.
arXiv Detail & Related papers (2022-10-11T17:07:22Z)
Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis. We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z)
Accurate RNA 3D structure prediction using a language model-based deep learning approach [50.193512039121984]
RhoFold+ is an RNA language model-based deep learning method that accurately predicts 3D structures of single-chain RNAs from sequences. RhoFold+ offers a fully automated end-to-end pipeline for RNA 3D structure prediction.
arXiv Detail & Related papers (2022-07-04T17:15:35Z)
Convolutional generative adversarial imputation networks for spatio-temporal missing data in storm surge simulations [86.5302150777089]
Generative Adversarial Imputation Nets (GANs) and GAN-based techniques have attracted attention as unsupervised machine learning methods. We name our proposed method as Con Conval Generative Adversarial Imputation Nets (Conv-GAIN)
arXiv Detail & Related papers (2021-11-03T03:50:48Z)
Deep Transformer Networks for Time Series Classification: The NPP Safety Case [59.20947681019466]
An advanced temporal neural network referred to as the Transformer is used within a supervised learning fashion to model the time-dependent NPP simulation data. The Transformer can learn the characteristics of the sequential data and yield promising performance with approximately 99% classification accuracy on the testing dataset.
arXiv Detail & Related papers (2021-04-09T14:26:25Z)
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior [53.69310441063162]
This paper proposes a sequential prior in a discrete latent space which can generate more naturally sounding samples. We evaluate the approach using listening tests, objective metrics of automatic speech recognition (ASR) performance, and measurements of prosody attributes.
arXiv Detail & Related papers (2020-02-06T12:35:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.