Unlasting: Unpaired Single-Cell Multi-Perturbation Estimation by Dual Conditional Diffusion Implicit Bridges
- URL: http://arxiv.org/abs/2506.21107v2
- Date: Wed, 13 Aug 2025 07:57:09 GMT
- Title: Unlasting: Unpaired Single-Cell Multi-Perturbation Estimation by Dual Conditional Diffusion Implicit Bridges
- Authors: Changxi Chi, Jun Xia, Yufei Huang, Jingbo Zhou, Siyuan Li, Yunfan Liu, Chang Yu, Stan Z. Li,
- Abstract summary: We propose a framework based on Dual Diffusion Implicit Bridges (DDIB) to learn the mapping between different data distributions.<n>We integrate gene regulatory network (GRN) information to propagate perturbation signals in a biologically meaningful way.<n>We also incorporate a masking mechanism to predict silent genes, improving the quality of generated profiles.
- Score: 68.98973318553983
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating single-cell responses across various perturbations facilitates the identification of key genes and enhances drug screening, significantly boosting experimental efficiency. However, single-cell sequencing is a destructive process, making it impossible to capture the same cell's phenotype before and after perturbation. Consequently, data collected under perturbed and unperturbed conditions are inherently unpaired. Existing methods either attempt to forcibly pair unpaired data using random sampling, or neglect the inherent relationship between unperturbed and perturbed cells during the modeling. In this work, we propose a framework based on Dual Diffusion Implicit Bridges (DDIB) to learn the mapping between different data distributions, effectively addressing the challenge of unpaired data. We further interpret this framework as a form of data augmentation. We integrate gene regulatory network (GRN) information to propagate perturbation signals in a biologically meaningful way, and further incorporate a masking mechanism to predict silent genes, improving the quality of generated profiles. Moreover, gene expression under the same perturbation often varies significantly across cells, frequently exhibiting a bimodal distribution that reflects intrinsic heterogeneity. To capture this, we introduce a more suitable evaluation metric. We propose Unlasting, dual conditional diffusion models that overcome the problem of unpaired single-cell perturbation data and strengthen the model's insight into perturbations under the guidance of the GRN, with a dedicated mask model designed to improve generation quality by predicting silent genes. In addition, we introduce a biologically grounded evaluation metric that better reflects the inherent heterogeneity in single-cell responses.
Related papers
- scDFM: Distributional Flow Matching Model for Robust Single-Cell Perturbation Prediction [12.48933770510505]
We present scDFM, a generative framework based on conditional flow matching.<n> scDFM aligns perturbed and control populations beyond cell-level correspondences.
arXiv Detail & Related papers (2026-02-06T17:00:21Z) - ScDiVa: Masked Discrete Diffusion for Joint Modeling of Single-Cell Identity and Expression [24.508523704467695]
Single-cell RNA-seq profiles are high-dimensional, sparse, and unordered, causing autoregressive generation to impose an artificial ordering bias.<n>We propose scDiVa, a masked discrete diffusion foundation model that aligns generation with the dropout-like corruption process.
arXiv Detail & Related papers (2026-02-03T12:50:29Z) - Latent Causal Diffusions for Single-Cell Perturbation Modeling [83.47931153555321]
We present a generative model that frames single-cell gene expression as a stationary diffusion process observed under measurement noise.<n> LCD outperforms established approaches in predicting the distributional shifts of unseen perturbation combinations in single-cell RNA-sequencing screens.<n>We develop an approach we call causal linearization via perturbation responses (CLIPR), which yields an approximation of the direct causal effects between all genes.
arXiv Detail & Related papers (2026-01-20T16:15:38Z) - Departures: Distributional Transport for Single-Cell Perturbation Prediction with Neural Schrödinger Bridges [51.83259180910313]
A major bottleneck in gene function analysis is the unpaired nature of single-cell data.<n>We approximate Schrdinger Bridge (SB) to tackle unpaired single-cell perturbation data.<n>Our model effectively captures heterogeneous single-cell responses and achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-11-17T08:27:13Z) - Scalable Single-Cell Gene Expression Generation with Latent Diffusion Models [11.343106383645441]
We introduce a scalable latent diffusion model for single-cell gene expression data, which we refer to as scLDM.<n>We show its superior performance in a variety of experiments for both observational and perturbational single-cell data, as well as downstream tasks like cell-level classification.
arXiv Detail & Related papers (2025-11-04T20:44:12Z) - Modeling variable guide efficiency in pooled CRISPR screens with ContrastiveVI+ [1.2289361708127877]
ContrastiveVI+ is a generative modeling framework that disentangles perturbation-induced from non-perturbation-related variations.
It is applied to three large-scale Perturb-seq datasets.
arXiv Detail & Related papers (2024-11-11T19:16:34Z) - CSGDN: Contrastive Signed Graph Diffusion Network for Predicting Crop Gene-phenotype Associations [6.5678927417916455]
We propose a Contrastive Signed Graph Diffusion Network, CSGDN, to learn robust node representations with fewer training samples to achieve higher link prediction accuracy.
We conduct experiments to validate the performance of CSGDN on three crop datasets: Gossypium hirsutum, Brassica napus, and Triticum turgidum.
arXiv Detail & Related papers (2024-10-10T01:01:10Z) - Multi-Modal and Multi-Attribute Generation of Single Cells with CFGen [76.02070962797794]
This work introduces CellFlow for Generation (CFGen), a flow-based conditional generative model that preserves the inherent discreteness of single-cell data.<n>CFGen generates whole-genome multi-modal single-cell data reliably, improving the recovery of crucial biological data characteristics.
arXiv Detail & Related papers (2024-07-16T14:05:03Z) - sc-OTGM: Single-Cell Perturbation Modeling by Solving Optimal Mass Transport on the Manifold of Gaussian Mixtures [0.9674145073701153]
sc-OTGM is an unsupervised model grounded in the inductive bias that the scRNAseq data can be generated.
sc-OTGM is effective in cell state classification, aids in the analysis of differential gene expression, and ranks genes for target identification.
It also predicts the effects of single-gene perturbations on downstream gene regulation and generates synthetic scRNA-seq data conditioned on specific cell states.
arXiv Detail & Related papers (2024-05-06T06:46:11Z) - AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model [59.08735812631131]
Anomaly inspection plays an important role in industrial manufacture.
Existing anomaly inspection methods are limited in their performance due to insufficient anomaly data.
We propose AnomalyDiffusion, a novel diffusion-based few-shot anomaly generation model.
arXiv Detail & Related papers (2023-12-10T05:13:40Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Conditionally Invariant Representation Learning for Disentangling
Cellular Heterogeneity [25.488181126364186]
This paper presents a novel approach that leverages domain variability to learn representations that are conditionally invariant to unwanted variability or distractors.
We apply our method to grand biological challenges, such as data integration in single-cell genomics.
Specifically, the proposed approach helps to disentangle biological signals from data biases that are unrelated to the target task or the causal explanation of interest.
arXiv Detail & Related papers (2023-07-02T12:52:41Z) - DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with
GFlowNets [81.75973217676986]
Gene regulatory networks (GRN) describe interactions between genes and their products that control gene expression and cellular function.
Existing methods either focus on challenge (1), identifying cyclic structure from dynamics, or on challenge (2) learning complex Bayesian posteriors over DAGs, but not both.
In this paper we leverage the fact that it is possible to estimate the "velocity" of gene expression with RNA velocity techniques to develop an approach that addresses both challenges.
arXiv Detail & Related papers (2023-02-08T16:36:40Z) - Learning Causal Representations of Single Cells via Sparse Mechanism
Shift Modeling [3.2435888122704037]
We propose a deep generative model of single-cell gene expression data for which each perturbation is treated as an intervention targeting an unknown, but sparse, subset of latent variables.
We benchmark these methods on simulated single-cell data to evaluate their performance at latent units recovery, causal target identification and out-of-domain generalization.
arXiv Detail & Related papers (2022-11-07T15:47:40Z) - CausalBench: A Large-scale Benchmark for Network Inference from
Single-cell Perturbation Data [61.088705993848606]
We introduce CausalBench, a benchmark suite for evaluating causal inference methods on real-world interventional data.
CaulBench incorporates biologically-motivated performance metrics, including new distribution-based interventional metrics.
arXiv Detail & Related papers (2022-10-31T13:04:07Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.