scDFM: Distributional Flow Matching Model for Robust Single-Cell Perturbation Prediction
- URL: http://arxiv.org/abs/2602.07103v1
- Date: Fri, 06 Feb 2026 17:00:21 GMT
- Title: scDFM: Distributional Flow Matching Model for Robust Single-Cell Perturbation Prediction
- Authors: Chenglei Yu, Chuanrui Wang, Bangyan Liao, Tailin Wu,
- Abstract summary: We present scDFM, a generative framework based on conditional flow matching.<n> scDFM aligns perturbed and control populations beyond cell-level correspondences.
- Score: 12.48933770510505
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A central goal in systems biology and drug discovery is to predict the transcriptional response of cells to perturbations. This task is challenging due to the noisy and sparse nature of single-cell measurements, as well as the fact that perturbations often induce population-level shifts rather than changes in individual cells. Existing deep learning methods typically assume cell-level correspondences, limiting their ability to capture such global effects. We present scDFM, a generative framework based on conditional flow matching that models the full distribution of perturbed cells conditioned on control states. By incorporating a maximum mean discrepancy (MMD) objective, our method aligns perturbed and control populations beyond cell-level correspondences. To further improve robustness to sparsity and noise, we introduce the Perturbation-Aware Differential Transformer (PAD-Transformer), a backbone architecture that leverages gene interaction graphs and differential attention to capture context-specific expression changes. Across multiple genetic and drug perturbation benchmarks, scDFM consistently outperforms prior methods, demonstrating strong generalization in both unseen and combinatorial settings. In the combinatorial setting, it reduces mean squared error by 19.6% relative to the strongest baseline. These results highlight the importance of distribution-level generative modeling for robust in silico perturbation prediction. The code is available at https://github.com/AI4Science-WestlakeU/scDFM
Related papers
- ScDiVa: Masked Discrete Diffusion for Joint Modeling of Single-Cell Identity and Expression [24.508523704467695]
Single-cell RNA-seq profiles are high-dimensional, sparse, and unordered, causing autoregressive generation to impose an artificial ordering bias.<n>We propose scDiVa, a masked discrete diffusion foundation model that aligns generation with the dropout-like corruption process.
arXiv Detail & Related papers (2026-02-03T12:50:29Z) - Robust Machine Learning for Regulatory Sequence Modeling under Biological and Technical Distribution Shifts [0.3948325938742681]
We introduce a robustness framework to quantify performance degradation, calibration failures, and uncertainty based reliability.<n>In simulation, motif driven regulatory outputs are generated with cell type specific programs, perturbations, GC bias, depth variation, batch effects, and heteroscedastic noise.<n>Models remain accurate but show higher error, severe variance miscalibration, and coverage collapse under motif effect rewiring and noise dominated regimes.
arXiv Detail & Related papers (2026-01-21T13:15:27Z) - Latent Causal Diffusions for Single-Cell Perturbation Modeling [83.47931153555321]
We present a generative model that frames single-cell gene expression as a stationary diffusion process observed under measurement noise.<n> LCD outperforms established approaches in predicting the distributional shifts of unseen perturbation combinations in single-cell RNA-sequencing screens.<n>We develop an approach we call causal linearization via perturbation responses (CLIPR), which yields an approximation of the direct causal effects between all genes.
arXiv Detail & Related papers (2026-01-20T16:15:38Z) - Departures: Distributional Transport for Single-Cell Perturbation Prediction with Neural Schrödinger Bridges [51.83259180910313]
A major bottleneck in gene function analysis is the unpaired nature of single-cell data.<n>We approximate Schrdinger Bridge (SB) to tackle unpaired single-cell perturbation data.<n>Our model effectively captures heterogeneous single-cell responses and achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-11-17T08:27:13Z) - Scalable Single-Cell Gene Expression Generation with Latent Diffusion Models [11.343106383645441]
We introduce a scalable latent diffusion model for single-cell gene expression data, which we refer to as scLDM.<n>We show its superior performance in a variety of experiments for both observational and perturbational single-cell data, as well as downstream tasks like cell-level classification.
arXiv Detail & Related papers (2025-11-04T20:44:12Z) - Unlasting: Unpaired Single-Cell Multi-Perturbation Estimation by Dual Conditional Diffusion Implicit Bridges [68.98973318553983]
We propose a framework based on Dual Diffusion Implicit Bridges (DDIB) to learn the mapping between different data distributions.<n>We integrate gene regulatory network (GRN) information to propagate perturbation signals in a biologically meaningful way.<n>We also incorporate a masking mechanism to predict silent genes, improving the quality of generated profiles.
arXiv Detail & Related papers (2025-06-26T09:05:38Z) - Large-Scale Targeted Cause Discovery via Learning from Simulated Data [66.51307552703685]
We propose a novel machine learning approach for inferring causal variables of a target variable from observations.<n>We train a neural network using supervised learning on simulated data to infer causality.<n> Empirical results demonstrate superior performance in identifying causal relationships within large-scale gene regulatory networks.
arXiv Detail & Related papers (2024-08-29T02:21:11Z) - Multi-Modal and Multi-Attribute Generation of Single Cells with CFGen [76.02070962797794]
This work introduces CellFlow for Generation (CFGen), a flow-based conditional generative model that preserves the inherent discreteness of single-cell data.<n>CFGen generates whole-genome multi-modal single-cell data reliably, improving the recovery of crucial biological data characteristics.
arXiv Detail & Related papers (2024-07-16T14:05:03Z) - Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications.
Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space.
We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z) - sc-OTGM: Single-Cell Perturbation Modeling by Solving Optimal Mass Transport on the Manifold of Gaussian Mixtures [0.9674145073701153]
sc-OTGM is an unsupervised model grounded in the inductive bias that the scRNAseq data can be generated.
sc-OTGM is effective in cell state classification, aids in the analysis of differential gene expression, and ranks genes for target identification.
It also predicts the effects of single-gene perturbations on downstream gene regulation and generates synthetic scRNA-seq data conditioned on specific cell states.
arXiv Detail & Related papers (2024-05-06T06:46:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.