C3-Diff: Super-resolving Spatial Transcriptomics via Cross-modal Cross-content Contrastive Diffusion Modelling
- URL: http://arxiv.org/abs/2511.05571v1
- Date: Tue, 04 Nov 2025 13:12:25 GMT
- Title: C3-Diff: Super-resolving Spatial Transcriptomics via Cross-modal Cross-content Contrastive Diffusion Modelling
- Authors: Xiaofei Wang, Stephen Price, Chao Li,
- Abstract summary: This study presents a cross-modal cross-content contrastive diffusion framework, called C3-Diff, for ST enhancement with histology images as guidance.<n>To overcome the problem of low sequencing sensitivity in ST maps, we perform nosing-based information augmentation on the surface of feature unit hypersphere.<n>We propose a dynamic cross-modal imputation-based training strategy to mitigate ST data scarcity.
- Score: 5.986183062217602
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid advancement of spatial transcriptomics (ST), i.e., spatial gene expressions, has made it possible to measure gene expression within original tissue, enabling us to discover molecular mechanisms. However, current ST platforms frequently suffer from low resolution, limiting the in-depth understanding of spatial gene expression. Super-resolution approaches promise to enhance ST maps by integrating histology images with gene expressions of profiled tissue spots. However, it remains a challenge to model the interactions between histology images and gene expressions for effective ST enhancement. This study presents a cross-modal cross-content contrastive diffusion framework, called C3-Diff, for ST enhancement with histology images as guidance. In C3-Diff, we firstly analyze the deficiency of traditional contrastive learning paradigm, which is then refined to extract both modal-invariant and content-invariant features of ST maps and histology images. Further, to overcome the problem of low sequencing sensitivity in ST maps, we perform nosing-based information augmentation on the surface of feature unit hypersphere. Finally, we propose a dynamic cross-modal imputation-based training strategy to mitigate ST data scarcity. We tested C3-Diff by benchmarking its performance on four public datasets, where it achieves significant improvements over competing methods. Moreover, we evaluate C3-Diff on downstream tasks of cell type localization, gene expression correlation and single-cell-level gene expression prediction, promoting AI-enhanced biotechnology for biomedical research and clinical applications. Codes are available at https://github.com/XiaofeiWang2018/C3-Diff.
Related papers
- A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis [82.01597026329158]
We introduce a Correlation-Regulated Alignment Framework for Tissue Synthesis (CRAFTS) for pathology-specific text-to-image synthesis.<n>CRAFTS incorporates a novel alignment mechanism that suppresses semantic drift to ensure biological accuracy.<n>This model generates diverse pathological images spanning 30 cancer types, with quality rigorously validated by objective metrics and pathologist evaluations.
arXiv Detail & Related papers (2025-12-15T10:22:43Z) - Dual-Path Knowledge-Augmented Contrastive Alignment Network for Spatially Resolved Transcriptomics [5.455957568203595]
High cost has driven efforts to predict spatial gene expression from whole slide images.<n>Current methods face significant limitations, such as under-exploitation of high-level biological context.<n>We propose DKAN, a novel Dual-path Knowledge-Augmented contrastive alignment Network.
arXiv Detail & Related papers (2025-11-21T10:58:04Z) - HiFusion: Hierarchical Intra-Spot Alignment and Regional Context Fusion for Spatial Gene Expression Prediction from Histopathology [7.982889842329205]
HiFusion is a novel deep learning framework that integrates two complementary components.<n>We show that HiFusion achieves state-of-the-art performance across both 2D slide-wise cross-validation and more challenging 3D sample-specific scenarios.<n>These results underscore HiFusion's potential as a robust, accurate, and scalable solution for ST inference from routine histopathology.
arXiv Detail & Related papers (2025-11-17T04:47:39Z) - HaDM-ST: Histology-Assisted Differential Modeling for Spatial Transcriptomics Generation [3.0708458805558347]
We propose HaDM-ST (Histology-assisted Differential Modeling for ST Generation), a high-resolution ST generation framework conditioned on H&E images and low-resolution ST.<n> Experiments on 200 genes across diverse tissues and species show HaDM-ST consistently outperforms prior methods, enhancing spatial fidelity and gene-level coherence in high-resolution ST predictions.
arXiv Detail & Related papers (2025-08-10T08:09:06Z) - Towards Unified and Lossless Latent Space for 3D Molecular Latent Diffusion Modeling [90.23688195918432]
3D molecule generation is crucial for drug discovery and material science.<n>Existing approaches typically maintain separate latent spaces for invariant and equivariant modalities.<n>We propose textbfUAE-3D, a multi-modal VAE that compresses 3D molecules into latent sequences from a unified latent space.
arXiv Detail & Related papers (2025-03-19T08:56:13Z) - MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention [57.044719143401664]
Histopathology and transcriptomics are fundamental modalities in oncology, encapsulating the morphological and molecular aspects of the disease.<n>We present MIRROR, a novel multi-modal representation learning method designed to foster both modality alignment and retention.<n>Extensive evaluations on TCGA cohorts for cancer subtyping and survival analysis highlight MIRROR's superior performance.
arXiv Detail & Related papers (2025-03-01T07:02:30Z) - Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View [49.03501451546763]
We identify the importance of implicit correspondences across biological contexts for exploiting domain-invariant pathological composition.
We propose self-adaptive dynamic distillation to secure instance-aware trade-offs across different model constituents.
arXiv Detail & Related papers (2024-07-14T04:41:16Z) - Cross-modal Diffusion Modelling for Super-resolved Spatial Transcriptomics [5.904688354944791]
spatial transcriptomics allows to characterize spatial gene expression within tissue for discovery research.<n>Super-resolution approaches promise to enhance ST maps by integrating histology images with gene expressions of profiled tissue spots.<n>This paper proposes a cross-modal conditional diffusion model for super-resolving ST maps with the guidance of histology images.
arXiv Detail & Related papers (2024-04-19T16:01:00Z) - Accurate Spatial Gene Expression Prediction by integrating Multi-resolution features [0.0]
TRIPLEX is a novel deep learning framework designed to predict spatial gene expression from Whole Slide Images (WSIs)
Our benchmark study, conducted on three public ST datasets, demonstrates that TRIPLEX outperforms current state-of-the-art models in Mean Squared Error (MSE), Mean Absolute Error (MAE), and Pearson Correlation Coefficient (PCC)
The model's predictions align closely with ground truth gene expression profiles and tumor annotations, underscoring TRIPLEX's potential in advancing cancer diagnosis and treatment.
arXiv Detail & Related papers (2024-03-12T12:25:38Z) - Genetic InfoMax: Exploring Mutual Information Maximization in
High-Dimensional Imaging Genetics Studies [50.11449968854487]
Genome-wide association studies (GWAS) are used to identify relationships between genetic variations and specific traits.
Representation learning for imaging genetics is largely under-explored due to the unique challenges posed by GWAS.
We introduce a trans-modal learning framework Genetic InfoMax (GIM) to address the specific challenges of GWAS.
arXiv Detail & Related papers (2023-09-26T03:59:21Z) - Unsupervised ensemble-based phenotyping helps enhance the
discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles.
It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner.
These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z) - GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot
Learning [55.79997930181418]
Generalized Zero-Shot Learning aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes.
It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes.
We propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
arXiv Detail & Related papers (2022-07-05T04:04:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.