CFM-GP: Unified Conditional Flow Matching to Learn Gene Perturbation Across Cell Types
- URL: http://arxiv.org/abs/2508.08312v2
- Date: Tue, 28 Oct 2025 02:55:43 GMT
- Title: CFM-GP: Unified Conditional Flow Matching to Learn Gene Perturbation Across Cell Types
- Authors: Abrar Rahman Abir, Sajib Acharjee Dip, Liqing Zhang,
- Abstract summary: We present CFM-GP, a method for cell type-agnostic gene perturbation prediction.<n>CFM-GP learns a continuous, time-dependent transformation between unperturbed and perturbed gene expression distributions.<n>We evaluate on five datasets: SARS-CoV-2 infection, IFN-beta stimulated PBMCs, treated with glioblastoma Panobinostat, lupus under IFN-beta stimulation, and Statefate progenitor fate mapping.
- Score: 7.197454335008732
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding gene perturbation effects across diverse cellular contexts is a central challenge in functional genomics, with important implications for therapeutic discovery and precision medicine. Single-cell technologies enable high-resolution measurement of transcriptional responses, but collecting such data is costly and time-consuming, especially when repeated for each cell type. Existing computational methods often require separate models per cell type, limiting scalability and generalization. We present CFM-GP, a method for cell type-agnostic gene perturbation prediction. CFM-GP learns a continuous, time-dependent transformation between unperturbed and perturbed gene expression distributions, conditioned on cell type, allowing a single model to predict across all cell types. Unlike prior approaches that use discrete modeling, CFM-GP employs a flow matching objective to capture perturbation dynamics in a scalable manner. We evaluate on five datasets: SARS-CoV-2 infection, IFN-beta stimulated PBMCs, glioblastoma treated with Panobinostat, lupus under IFN-beta stimulation, and Statefate progenitor fate mapping. CFM-GP consistently outperforms state-of-the-art baselines in R-squared and Spearman correlation, and pathway enrichment analysis confirms recovery of key biological pathways. These results demonstrate the robustness and biological fidelity of CFM-GP as a scalable solution for cross-cell type gene perturbation prediction.
Related papers
- Graph Attention Based Prioritization of Disease Responsible Genes from Multimodal Alzheimer's Network [20.37811669228711]
Prioritizing disease-associated genes is central to understanding complex disorders such as Alzheimer's disease.<n>We propose NETRA, a multimodal graph transformer framework that replaces centrality metrics with attention-driven relevance scoring.<n>A graph transformer assigns NETRA scores that quantify gene relevance in a disease-specific and context-aware manner.
arXiv Detail & Related papers (2026-03-01T06:46:18Z) - scDFM: Distributional Flow Matching Model for Robust Single-Cell Perturbation Prediction [12.48933770510505]
We present scDFM, a generative framework based on conditional flow matching.<n> scDFM aligns perturbed and control populations beyond cell-level correspondences.
arXiv Detail & Related papers (2026-02-06T17:00:21Z) - Modeling Dabrafenib Response Using Multi-Omics Modality Fusion and Protein Network Embeddings Based on Graph Convolutional Networks [0.0]
Cancer cell response to targeted therapy arises from complex molecular interactions, making single omics insufficient for accurate prediction.<n>This study develops a model to predict Dabrafenib sensitivity by integrating multiple omics layers (genomics, transcriptomics, epigenomics, metabolomics) with protein network embeddings generated using Graph Convolutional Networks (GCN)<n>Results show that attention guided multi omics fusion combined with GCN improves drug response prediction and reveals complementary molecular determinants of Dabrafenib sensitivity.
arXiv Detail & Related papers (2025-12-13T02:00:56Z) - AdaFusion: Prompt-Guided Inference with Adaptive Fusion of Pathology Foundation Models [49.550545038402184]
We propose AdaFusion, a novel prompt-guided inference framework.<n>Our method compresses and aligns tile-level features from diverse models.<n>AdaFusion consistently surpasses individual PFMs across both classification and regression tasks.
arXiv Detail & Related papers (2025-08-07T07:09:31Z) - Interpretable Graph Kolmogorov-Arnold Networks for Multi-Cancer Classification and Biomarker Identification using Multi-Omics Data [36.92842246372894]
Multi-Omics Graph Kolmogorov-Arnold Network (MOGKAN) is a deep learning framework that utilizes messenger-RNA, micro-RNA sequences, and DNA methylation samples.<n>By integrating multi-omics data with graph-based deep learning, our proposed approach demonstrates robust predictive performance and interpretability.
arXiv Detail & Related papers (2025-03-29T02:14:05Z) - A scalable gene network model of regulatory dynamics in single cells [88.48246132084441]
We introduce a Functional Learnable model of Cell dynamicS, FLeCS, that incorporates gene network structure into coupled differential equations to model gene regulatory functions.<n>Given (pseudo)time-series single-cell data, FLeCS accurately infers cell dynamics at scale.
arXiv Detail & Related papers (2025-03-25T19:19:21Z) - scGSDR: Harnessing Gene Semantics for Single-Cell Pharmacological Profiling [5.831554646284266]
scGSDR is a model that integrates two computational pipelines grounded in the knowledge of cellular states and gene signaling pathways.<n> scGSDR enhances predictive performance by incorporating gene semantics and employs an interpretability module.<n>The model's application has extended from single-drug predictions to scenarios involving drug combinations.
arXiv Detail & Related papers (2025-02-02T15:43:20Z) - Cell-ontology guided transcriptome foundation model [18.51941953027685]
We pre-trained scCello on 22 million cells from CellxGene database leveraging their cell-type labels mapped to the cell ontology graph from Open Biological and Biomedical Ontology Foundry.<n>Our TFM demonstrates competitive generalization and transferability performance over the existing TFMs on biologically important tasks.
arXiv Detail & Related papers (2024-08-22T13:15:49Z) - Multi-Modal and Multi-Attribute Generation of Single Cells with CFGen [76.02070962797794]
This work introduces CellFlow for Generation (CFGen), a flow-based conditional generative model that preserves the inherent discreteness of single-cell data.<n>CFGen generates whole-genome multi-modal single-cell data reliably, improving the recovery of crucial biological data characteristics.
arXiv Detail & Related papers (2024-07-16T14:05:03Z) - MMIL: A novel algorithm for disease associated cell type discovery [58.044870442206914]
Single-cell datasets often lack individual cell labels, making it challenging to identify cells associated with disease.
We introduce Mixture Modeling for Multiple Learning Instance (MMIL), an expectation method that enables the training and calibration of cell-level classifiers.
arXiv Detail & Related papers (2024-06-12T15:22:56Z) - Path-GPTOmic: A Balanced Multi-modal Learning Framework for Survival Outcome Prediction [14.204637932937082]
We introduce a new multi-modal Path-GPTOmic" framework for cancer survival outcome prediction.
We regulate the embedding space of a foundation model, scGPT, initially trained on single-cell RNA-seq data.
We propose a gradient modulation mechanism tailored to the Cox partial likelihood loss for survival prediction.
arXiv Detail & Related papers (2024-03-18T00:02:48Z) - Mixed Models with Multiple Instance Learning [51.440557223100164]
We introduce MixMIL, a framework integrating Generalized Linear Mixed Models (GLMM) and Multiple Instance Learning (MIL)
Our empirical results reveal that MixMIL outperforms existing MIL models in single-cell datasets.
arXiv Detail & Related papers (2023-11-04T16:42:42Z) - CausalBench: A Large-scale Benchmark for Network Inference from
Single-cell Perturbation Data [61.088705993848606]
We introduce CausalBench, a benchmark suite for evaluating causal inference methods on real-world interventional data.
CaulBench incorporates biologically-motivated performance metrics, including new distribution-based interventional metrics.
arXiv Detail & Related papers (2022-10-31T13:04:07Z) - Granger causal inference on DAGs identifies genomic loci regulating
transcription [77.58911272503771]
GrID-Net is a framework based on graph neural networks with lagged message passing for Granger causal inference on DAG-structured systems.
Our application is the analysis of single-cell multimodal data to identify genomic loci that mediate the regulation of specific genes.
arXiv Detail & Related papers (2022-10-18T21:15:10Z) - Multi-modal Self-supervised Pre-training for Regulatory Genome Across
Cell Types [75.65676405302105]
We propose a simple yet effective approach for pre-training genome data in a multi-modal and self-supervised manner, which we call GeneBERT.
We pre-train our model on the ATAC-seq dataset with 17 million genome sequences.
arXiv Detail & Related papers (2021-10-11T12:48:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.