PerturbDiff: Functional Diffusion for Single-Cell Perturbation Modeling
- URL: http://arxiv.org/abs/2602.19685v1
- Date: Mon, 23 Feb 2026 10:28:56 GMT
- Title: PerturbDiff: Functional Diffusion for Single-Cell Perturbation Modeling
- Authors: Xinyu Yuan, Xixian Liu, Ya Shi Zhang, Zuobai Zhang, Hongyu Guo, Jian Tang,
- Abstract summary: We introduce PerturbDiff, which shifts modeling from individual cells to entire distributions.<n>PerturbDiff achieves state-of-the-art performance in single-cell response prediction and generalizes substantially better to unseen perturbations.
- Score: 32.464109652332915
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Building Virtual Cells that can accurately simulate cellular responses to perturbations is a long-standing goal in systems biology. A fundamental challenge is that high-throughput single-cell sequencing is destructive: the same cell cannot be observed both before and after a perturbation. Thus, perturbation prediction requires mapping unpaired control and perturbed populations. Existing models address this by learning maps between distributions, but typically assume a single fixed response distribution when conditioned on observed cellular context (e.g., cell type) and the perturbation type. In reality, responses vary systematically due to unobservable latent factors such as microenvironmental fluctuations and complex batch effects, forming a manifold of possible distributions for the same observed conditions. To account for this variability, we introduce PerturbDiff, which shifts modeling from individual cells to entire distributions. By embedding distributions as points in a Hilbert space, we define a diffusion-based generative process operating directly over probability distributions. This allows PerturbDiff to capture population-level response shifts across hidden factors. Benchmarks on established datasets show that PerturbDiff achieves state-of-the-art performance in single-cell response prediction and generalizes substantially better to unseen perturbations. See our project page (https://katarinayuan.github.io/PerturbDiff-ProjectPage/), where code and data will be made publicly available (https://github.com/DeepGraphLearning/PerturbDiff).
Related papers
- scDFM: Distributional Flow Matching Model for Robust Single-Cell Perturbation Prediction [12.48933770510505]
We present scDFM, a generative framework based on conditional flow matching.<n> scDFM aligns perturbed and control populations beyond cell-level correspondences.
arXiv Detail & Related papers (2026-02-06T17:00:21Z) - Departures: Distributional Transport for Single-Cell Perturbation Prediction with Neural Schrödinger Bridges [51.83259180910313]
A major bottleneck in gene function analysis is the unpaired nature of single-cell data.<n>We approximate Schrdinger Bridge (SB) to tackle unpaired single-cell perturbation data.<n>Our model effectively captures heterogeneous single-cell responses and achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-11-17T08:27:13Z) - Unlasting: Unpaired Single-Cell Multi-Perturbation Estimation by Dual Conditional Diffusion Implicit Bridges [68.98973318553983]
We propose a framework based on Dual Diffusion Implicit Bridges (DDIB) to learn the mapping between different data distributions.<n>We integrate gene regulatory network (GRN) information to propagate perturbation signals in a biologically meaningful way.<n>We also incorporate a masking mechanism to predict silent genes, improving the quality of generated profiles.
arXiv Detail & Related papers (2025-06-26T09:05:38Z) - Large-Scale Targeted Cause Discovery via Learning from Simulated Data [66.51307552703685]
We propose a novel machine learning approach for inferring causal variables of a target variable from observations.<n>We train a neural network using supervised learning on simulated data to infer causality.<n> Empirical results demonstrate superior performance in identifying causal relationships within large-scale gene regulatory networks.
arXiv Detail & Related papers (2024-08-29T02:21:11Z) - Generative inpainting of incomplete Euclidean distance matrices of trajectories generated by a fractional Brownian motion [46.1232919707345]
Fractional Brownian motion (fBm) features both randomness and strong scale-free correlations.
Here we examine a zoo of diffusion-based inpainting methods on a specific dataset of corrupted images.
We find that the conditional diffusion generation readily reproduces the built-in correlations of fBm paths in different memory regimes.
arXiv Detail & Related papers (2024-04-10T14:22:16Z) - Score-based Causal Representation Learning with Interventions [54.735484409244386]
This paper studies the causal representation learning problem when latent causal variables are observed indirectly.
The objectives are: (i) recovering the unknown linear transformation (up to scaling) and (ii) determining the directed acyclic graph (DAG) underlying the latent variables.
arXiv Detail & Related papers (2023-01-19T18:39:48Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Learning Causal Representations of Single Cells via Sparse Mechanism
Shift Modeling [3.2435888122704037]
We propose a deep generative model of single-cell gene expression data for which each perturbation is treated as an intervention targeting an unknown, but sparse, subset of latent variables.
We benchmark these methods on simulated single-cell data to evaluate their performance at latent units recovery, causal target identification and out-of-domain generalization.
arXiv Detail & Related papers (2022-11-07T15:47:40Z) - On the Strong Correlation Between Model Invariance and Generalization [54.812786542023325]
Generalization captures a model's ability to classify unseen data.
Invariance measures consistency of model predictions on transformations of the data.
From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets.
arXiv Detail & Related papers (2022-07-14T17:08:25Z) - Causal Discovery in Heterogeneous Environments Under the Sparse
Mechanism Shift Hypothesis [7.895866278697778]
Machine learning approaches commonly rely on the assumption of independent and identically distributed (i.i.d.) data.
In reality, this assumption is almost always violated due to distribution shifts between environments.
We propose the Mechanism Shift Score (MSS), a score-based approach amenable to various empirical estimators.
arXiv Detail & Related papers (2022-06-04T15:39:30Z) - Flexible Amortized Variational Inference in qBOLD MRI [56.4324135502282]
Oxygen extraction fraction (OEF) and deoxygenated blood volume (DBV) are more ambiguously determined from the data.
Existing inference methods tend to yield very noisy and underestimated OEF maps, while overestimating DBV.
This work describes a novel probabilistic machine learning approach that can infer plausible distributions of OEF and DBV.
arXiv Detail & Related papers (2022-03-11T10:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.