MMAP: A Multi-Magnification and Prototype-Aware Architecture for Predicting Spatial Gene Expression
- URL: http://arxiv.org/abs/2510.11344v1
- Date: Mon, 13 Oct 2025 12:41:09 GMT
- Title: MMAP: A Multi-Magnification and Prototype-Aware Architecture for Predicting Spatial Gene Expression
- Authors: Hai Dang Nguyen, Nguyen Dang Huy Pham, The Minh Duc Nguyen, Dac Thai Nguyen, Hang Thi Nguyen, Duong M. Nguyen,
- Abstract summary: Spatial Transcriptomics (ST) enables the measurement of gene expression while preserving spatial information.<n>Recent developments have explored the use of hematoxylin and eosin (H&E)-stained whole-slide images (WSIs) to predict transcriptome-wide gene expression profiles through deep neural networks.<n>However, predicting spatial gene expression from histological images remains a challenging problem due to the significant modality gap between visual features and molecular signals.<n>In this work, we propose a novel framework, MMAP (Multi-MAgnification and Prototype-enhanced architecture), that addresses both challenges simultaneously.
- Score: 1.083137038945176
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Spatial Transcriptomics (ST) enables the measurement of gene expression while preserving spatial information, offering critical insights into tissue architecture and disease pathology. Recent developments have explored the use of hematoxylin and eosin (H&E)-stained whole-slide images (WSIs) to predict transcriptome-wide gene expression profiles through deep neural networks. This task is commonly framed as a regression problem, where each input corresponds to a localized image patch extracted from the WSI. However, predicting spatial gene expression from histological images remains a challenging problem due to the significant modality gap between visual features and molecular signals. Recent studies have attempted to incorporate both local and global information into predictive models. Nevertheless, existing methods still suffer from two key limitations: (1) insufficient granularity in local feature extraction, and (2) inadequate coverage of global spatial context. In this work, we propose a novel framework, MMAP (Multi-MAgnification and Prototype-enhanced architecture), that addresses both challenges simultaneously. To enhance local feature granularity, MMAP leverages multi-magnification patch representations that capture fine-grained histological details. To improve global contextual understanding, it learns a set of latent prototype embeddings that serve as compact representations of slide-level information. Extensive experimental results demonstrate that MMAP consistently outperforms all existing state-of-the-art methods across multiple evaluation metrics, including Mean Absolute Error (MAE), Mean Squared Error (MSE), and Pearson Correlation Coefficient (PCC).
Related papers
- Fusing Pixels and Genes: Spatially-Aware Learning in Computational Pathology [46.83014413674925]
STAMP is a spatial transcriptomics-augmented multimodal pathology representation learning framework.<n>Our study shows that self-supervised, gene-guided training provides a robust and task-agnostic signal for learning pathology image representations.<n>We validate STAMP across six datasets and four downstream tasks, where it consistently achieves strong performance.
arXiv Detail & Related papers (2026-02-15T00:59:13Z) - A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis [82.01597026329158]
We introduce a Correlation-Regulated Alignment Framework for Tissue Synthesis (CRAFTS) for pathology-specific text-to-image synthesis.<n>CRAFTS incorporates a novel alignment mechanism that suppresses semantic drift to ensure biological accuracy.<n>This model generates diverse pathological images spanning 30 cancer types, with quality rigorously validated by objective metrics and pathologist evaluations.
arXiv Detail & Related papers (2025-12-15T10:22:43Z) - HiFusion: Hierarchical Intra-Spot Alignment and Regional Context Fusion for Spatial Gene Expression Prediction from Histopathology [7.982889842329205]
HiFusion is a novel deep learning framework that integrates two complementary components.<n>We show that HiFusion achieves state-of-the-art performance across both 2D slide-wise cross-validation and more challenging 3D sample-specific scenarios.<n>These results underscore HiFusion's potential as a robust, accurate, and scalable solution for ST inference from routine histopathology.
arXiv Detail & Related papers (2025-11-17T04:47:39Z) - Neovascularization Segmentation via a Multilateral Interaction-Enhanced Graph Convolutional Network [48.788798029027085]
This paper proposes a novel multilateral graph convolutional interaction-enhanced CNV segmentation network (MTG-Net)<n> MTG-Net consists of a multi-task framework and two graph-based cross-task modules: Multilateral Interaction Graph Reasoning (MIGR) and Multilateral Reinforcement Graph Reasoning (MRGR)<n> Experimental results demonstrate that MTG-Net outperforms existing methods, achieving a Dice socre of 87.21% for region segmentation and 88.12% for vessel segmentation.
arXiv Detail & Related papers (2025-08-05T08:10:19Z) - Spatially Gene Expression Prediction using Dual-Scale Contrastive Learning [12.35331063443348]
NH2ST integrates spatial context and both pathology and gene modalities for gene expression prediction.<n>Our model consistently outperforms existing methods, achieving over 20% in PCC metrics.
arXiv Detail & Related papers (2025-06-30T13:18:39Z) - CLIP Meets Diffusion: A Synergistic Approach to Anomaly Detection [54.85000884785013]
Anomaly detection is a complex problem due to the ambiguity in defining anomalies, the diversity of anomaly types, and the scarcity of training data.<n>We propose CLIPfusion, a method that leverages both discriminative and generative foundation models.<n>We believe that our method underscores the effectiveness of multi-modal and multi-model fusion in tackling the multifaceted challenges of anomaly detection.
arXiv Detail & Related papers (2025-06-13T13:30:15Z) - PH2ST:ST-Prompt Guided Histological Hypergraph Learning for Spatial Gene Expression Prediction [9.420121324844066]
We propose PH2ST, a prompt-guided hypergraph learning framework, to guide multi-scale histological representation learning for spatial gene expression prediction.<n> PH2ST not only outperforms existing state-of-the-art methods, but also shows strong potential for practical applications such as imputing missing spots, ST super-resolution, and local-to-global prediction.
arXiv Detail & Related papers (2025-03-21T03:10:43Z) - MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention [57.044719143401664]
Histopathology and transcriptomics are fundamental modalities in oncology, encapsulating the morphological and molecular aspects of the disease.<n>We present MIRROR, a novel multi-modal representation learning method designed to foster both modality alignment and retention.<n>Extensive evaluations on TCGA cohorts for cancer subtyping and survival analysis highlight MIRROR's superior performance.
arXiv Detail & Related papers (2025-03-01T07:02:30Z) - Multi-modal Spatial Clustering for Spatial Transcriptomics Utilizing High-resolution Histology Images [1.3124513975412255]
spatial transcriptomics (ST) enables transcriptome-wide gene expression profiling while preserving spatial context.
Current spatial clustering methods fail to fully integrate high-resolution histology image features with gene expression data.
We propose a novel contrastive learning-based deep learning approach that integrates gene expression data with histology image features.
arXiv Detail & Related papers (2024-10-31T00:32:24Z) - MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection [64.29452783056253]
The rapid development of photo-realistic face generation methods has raised significant concerns in society and academia.<n>Although existing approaches mainly capture face forgery patterns using image modality, other modalities like fine-grained noises and texts are not fully explored.<n>We propose a novel multi-modal fine-grained CLIP (MFCLIP) model, which mines comprehensive and fine-grained forgery traces across image-noise modalities.
arXiv Detail & Related papers (2024-09-15T13:08:59Z) - SEPAL: Spatial Gene Expression Prediction from Local Graphs [1.4523812806185954]
We present SEPAL, a new model for predicting genetic profiles from visual tissue appearance.
Our method exploits the biological biases of the problem by directly supervising relative differences with respect to mean expression.
We propose a novel benchmark that aims to better define the task by following current best practices in transcriptomics.
arXiv Detail & Related papers (2023-09-02T23:24:02Z) - Hierarchical Transformer for Survival Prediction Using Multimodality
Whole Slide Images and Genomics [63.76637479503006]
Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical.
This paper proposes a hierarchical-based multimodal transformer framework that learns a hierarchical mapping between pathology images and corresponding genes.
Our architecture requires fewer GPU resources compared with benchmark methods while maintaining better WSI representation ability.
arXiv Detail & Related papers (2022-11-29T23:47:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.