Latent Gene Diffusion for Spatial Transcriptomics Completion
- URL: http://arxiv.org/abs/2509.01864v1
- Date: Tue, 02 Sep 2025 01:14:11 GMT
- Title: Latent Gene Diffusion for Spatial Transcriptomics Completion
- Authors: Paula Cárdenas, Leonardo Manrique, Daniela Vega, Daniela Ruiz, Pablo Arbeláez,
- Abstract summary: LGDiST is the first reference-free latent gene diffusion model for data dropout.<n>We show that LGDiST outperforms the previous state-of-the-art in gene expression completion.<n>A key innovation of LGDiST is using context genes to build a rich and biologically meaningful genetic latent space.
- Score: 2.8967421319667728
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Computer Vision has proven to be a powerful tool for analyzing Spatial Transcriptomics (ST) data. However, current models that predict spatially resolved gene expression from histopathology images suffer from significant limitations due to data dropout. Most existing approaches rely on single-cell RNA sequencing references, making them dependent on alignment quality and external datasets while also risking batch effects and inherited dropout. In this paper, we address these limitations by introducing LGDiST, the first reference-free latent gene diffusion model for ST data dropout. We show that LGDiST outperforms the previous state-of-the-art in gene expression completion, with an average Mean Squared Error that is 18% lower across 26 datasets. Furthermore, we demonstrate that completing ST data with LGDiST improves gene expression prediction performance on six state-of-the-art methods up to 10% in MSE. A key innovation of LGDiST is using context genes previously considered uninformative to build a rich and biologically meaningful genetic latent space. Our experiments show that removing key components of LGDiST, such as the context genes, the ST latent space, and the neighbor conditioning, leads to considerable drops in performance. These findings underscore that the full architecture of LGDiST achieves substantially better performance than any of its isolated components.
Related papers
- GenAR: Next-Scale Autoregressive Generation for Spatial Gene Expression Prediction [15.143858141542532]
GenAR is a multi-scale autoregressive framework that refines predictions from coarse to fine.<n>We introduce GenAR, a multi-scale autoregressive framework that refines predictions from coarse to fine.<n>GenAR achieves principled state-of-the-art performance, offering potential implications for precision medicine and cost-effective molecular profiling.
arXiv Detail & Related papers (2025-10-05T18:28:21Z) - Unlasting: Unpaired Single-Cell Multi-Perturbation Estimation by Dual Conditional Diffusion Implicit Bridges [68.98973318553983]
We propose a framework based on Dual Diffusion Implicit Bridges (DDIB) to learn the mapping between different data distributions.<n>We integrate gene regulatory network (GRN) information to propagate perturbation signals in a biologically meaningful way.<n>We also incorporate a masking mechanism to predict silent genes, improving the quality of generated profiles.
arXiv Detail & Related papers (2025-06-26T09:05:38Z) - GRAPE: Heterogeneous Graph Representation Learning for Genetic Perturbation with Coding and Non-Coding Biotype [51.58774936662233]
Building gene regulatory networks (GRN) is essential to understand and predict the effects of genetic perturbations.<n>In this work, we leverage pre-trained large language model and DNA sequence model to extract features from gene descriptions and DNA sequence data.<n>We introduce gene biotype information for the first time in genetic perturbation, simulating the distinct roles of genes with different biotypes in regulating cellular processes.
arXiv Detail & Related papers (2025-05-06T03:35:24Z) - Completing Spatial Transcriptomics Data for Gene Expression Prediction Benchmarking [1.177642303362119]
We introduce SpaRED, a database comprising 26 public datasets, and SpaCKLE, a state-of-the-art transformer-based gene expression completion model.<n>Our contributions constitute the most comprehensive benchmark of gene expression prediction from histology images to date.
arXiv Detail & Related papers (2025-05-05T19:17:29Z) - GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters.<n>Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks.<n>It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z) - Diffusion Generative Modeling for Spatially Resolved Gene Expression Inference from Histology Images [11.64540208294516]
We present $textbfStem$ ($textbfS$pa$textbfT$ially resolved gene $textbfE$xpression inference with diffusion.<n>$textbfStem$ generates high-fidelity gene expression predictions that share similar gene variation levels as ground truth data.<n>Our proposed pipeline opens up the possibility of analyzing existing, easily accessible H&E stained histology images from genomics point of view.
arXiv Detail & Related papers (2025-01-26T16:52:27Z) - SpaDiT: Diffusion Transformer for Spatial Gene Expression Prediction using scRNA-seq [9.624390863643109]
SpaDiT is a deep learning method that integrates scRNA-seq and ST data for the prediction of undetected genes.
We have demonstrated the effectiveness of SpaDiT through extensive experiments on both seq-based and image-based ST data.
arXiv Detail & Related papers (2024-07-18T05:40:50Z) - SpaRED benchmark: Enhancing Gene Expression Prediction from Histology Images with Spatial Transcriptomics Completion [2.032350440475489]
We present a systematically curated and processed database collected from 26 public sources.
We also propose a state-of-the-art transformer based completion technique for inferring missing gene expression.
Our contributions constitute the most comprehensive benchmark of gene expression prediction from histology images to date.
arXiv Detail & Related papers (2024-07-17T21:28:20Z) - Multi-Modal and Multi-Attribute Generation of Single Cells with CFGen [76.02070962797794]
This work introduces CellFlow for Generation (CFGen), a flow-based conditional generative model that preserves the inherent discreteness of single-cell data.<n>CFGen generates whole-genome multi-modal single-cell data reliably, improving the recovery of crucial biological data characteristics.
arXiv Detail & Related papers (2024-07-16T14:05:03Z) - sc-OTGM: Single-Cell Perturbation Modeling by Solving Optimal Mass Transport on the Manifold of Gaussian Mixtures [0.9674145073701153]
sc-OTGM is an unsupervised model grounded in the inductive bias that the scRNAseq data can be generated.
sc-OTGM is effective in cell state classification, aids in the analysis of differential gene expression, and ranks genes for target identification.
It also predicts the effects of single-gene perturbations on downstream gene regulation and generates synthetic scRNA-seq data conditioned on specific cell states.
arXiv Detail & Related papers (2024-05-06T06:46:11Z) - Efficient and Scalable Fine-Tune of Language Models for Genome
Understanding [49.606093223945734]
We present textscLingo: textscLanguage prefix ftextscIne-tuning for textscGentextscOmes.
Unlike DNA foundation models, textscLingo strategically leverages natural language foundation models' contextual cues.
textscLingo further accommodates numerous downstream fine-tune tasks by an adaptive rank sampling method.
arXiv Detail & Related papers (2024-02-12T21:40:45Z) - DiffSTG: Probabilistic Spatio-Temporal Graph Forecasting with Denoising
Diffusion Models [53.67562579184457]
This paper focuses on probabilistic STG forecasting, which is challenging due to the difficulty in modeling uncertainties and complex dependencies.
We present the first attempt to generalize the popular denoising diffusion models to STGs, leading to a novel non-autoregressive framework called DiffSTG.
Our approach combines the intrinsic-temporal learning capabilities STNNs with the uncertainty measurements of diffusion models.
arXiv Detail & Related papers (2023-01-31T13:42:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.