Prototype-driven fusion of pathology and spatial transcriptomics for interpretable survival prediction
- URL: http://arxiv.org/abs/2602.12441v1
- Date: Thu, 12 Feb 2026 21:59:07 GMT
- Title: Prototype-driven fusion of pathology and spatial transcriptomics for interpretable survival prediction
- Authors: Lihe Liu, Xiaoxi Pan, Yinyin Yuan, Lulu Shang,
- Abstract summary: Whole slide images (WSIs) enable weakly supervised prognostic modeling via multiple instance learning (MIL)<n> spatial transcriptomics (ST) preserves in situ gene expression, providing a spatial molecular context that complements morphology.<n>We introduce PathoSpatial, an interpretable end-to-end framework integrating WSIs and ST to learn spatially informed prognostic representations.
- Score: 0.17835405096329152
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Whole slide images (WSIs) enable weakly supervised prognostic modeling via multiple instance learning (MIL). Spatial transcriptomics (ST) preserves in situ gene expression, providing a spatial molecular context that complements morphology. As paired WSI-ST cohorts scale to population level, leveraging their complementary spatial signals for prognosis becomes crucial; however, principled cross-modal fusion strategies remain limited for this paradigm. To this end, we introduce PathoSpatial, an interpretable end-to-end framework integrating co-registered WSIs and ST to learn spatially informed prognostic representations. PathoSpatial uses task-guided prototype learning within a multi-level experts architecture, adaptively orchestrating unsupervised within-modality discovery with supervised cross-modal aggregation. By design, PathoSpatial substantially strengthens interpretability while maintaining discriminative ability. We evaluate PathoSpatial on a triple-negative breast cancer cohort with paired ST and WSIs. PathoSpatial delivers strong and consistent performance across five survival endpoints, achieving superior or comparable performance to leading unimodal and multimodal methods. PathoSpatial inherently enables post-hoc prototype interpretation and molecular risk decomposition, providing quantitative, biologically grounded explanations, highlighting candidate prognostic factors. We present PathoSpatial as a proof-of-concept for scalable and interpretable multimodal learning for spatial omics-pathology fusion.
Related papers
- Fusing Pixels and Genes: Spatially-Aware Learning in Computational Pathology [46.83014413674925]
STAMP is a spatial transcriptomics-augmented multimodal pathology representation learning framework.<n>Our study shows that self-supervised, gene-guided training provides a robust and task-agnostic signal for learning pathology image representations.<n>We validate STAMP across six datasets and four downstream tasks, where it consistently achieves strong performance.
arXiv Detail & Related papers (2026-02-15T00:59:13Z) - MultiST: A Cross-Attention-Based Multimodal Model for Spatial Transcriptomic [21.236918431473466]
We propose MultiST, a unified framework that models spatial topology, gene expression, and tissue morphology through cross-attention-based fusion.<n>We evaluated the proposed method on 13 diverse ST datasets spanning two organs, including human brain cortex and breast cancer tissue.
arXiv Detail & Related papers (2026-01-19T19:11:03Z) - R-GenIMA: Integrating Neuroimaging and Genetics with Interpretable Multimodal AI for Alzheimer's Disease Progression [63.97617759805451]
Early detection of Alzheimer's disease requires models capable of integrating macro-scale neuroanatomical alterations with micro-scale genetic susceptibility.<n>We introduce R-GenIMA, an interpretable multimodal large language model that couples a novel ROI-wise vision transformer with genetic prompting.<n>R-GenIMA achieves state-of-the-art performance in four-way classification across normal cognition, subjective memory concerns, mild cognitive impairment, and AD.
arXiv Detail & Related papers (2025-12-22T02:54:10Z) - Supervise Less, See More: Training-free Nuclear Instance Segmentation with Prototype-Guided Prompting [53.799446807827714]
We introduce SPROUT, a fully training- and annotation-free prompting framework for nuclear instance segmentation.<n> SPROUT leverages histology-informed priors to construct slide-specific reference prototypes.<n>The resulting foreground and background features are transformed into positive and negative point prompts, enabling the Segment Anything Model (SAM) to produce precise nuclear delineations.
arXiv Detail & Related papers (2025-11-25T05:58:33Z) - SurvAgent: Hierarchical CoT-Enhanced Case Banking and Dichotomy-Based Multi-Agent System for Multimodal Survival Prediction [49.355973075150075]
We introduce SurvAgent, the first hierarchical chain-of-thought (CoT)-enhanced multi-agent system for multimodal survival prediction.<n>SurvAgent consists of two stages: WSI-Gene CoT-Enhanced Case Bank Construction employs hierarchical analysis through Low-Magnification Screening, Cross-Modal Similarity-Aware Patch Mining, and Confidence-Aware Patch Mining for pathology images.<n>Dichotomy-Based Multi-Expert Agent Inference retrieves similar cases via RAG and integrates multimodal reports with expert predictions through progressive interval refinement.
arXiv Detail & Related papers (2025-11-20T18:41:44Z) - HiFusion: Hierarchical Intra-Spot Alignment and Regional Context Fusion for Spatial Gene Expression Prediction from Histopathology [7.982889842329205]
HiFusion is a novel deep learning framework that integrates two complementary components.<n>We show that HiFusion achieves state-of-the-art performance across both 2D slide-wise cross-validation and more challenging 3D sample-specific scenarios.<n>These results underscore HiFusion's potential as a robust, accurate, and scalable solution for ST inference from routine histopathology.
arXiv Detail & Related papers (2025-11-17T04:47:39Z) - Disentangled Multi-modal Learning of Histology and Transcriptomics for Cancer Characterization [30.456635152695483]
Histopathology remains the gold standard for cancer diagnosis and prognosis.<n>Multi-modal learning combining transcriptomics with histology offers more comprehensive information.<n>Existing multi-modal approaches are challenged by intrinsic multi-modal heterogeneity, insufficient multi-scale integration, and reliance on paired data.
arXiv Detail & Related papers (2025-08-22T15:51:33Z) - MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention [57.044719143401664]
Histopathology and transcriptomics are fundamental modalities in oncology, encapsulating the morphological and molecular aspects of the disease.<n>We present MIRROR, a novel multi-modal representation learning method designed to foster both modality alignment and retention.<n>Extensive evaluations on TCGA cohorts for cancer subtyping and survival analysis highlight MIRROR's superior performance.
arXiv Detail & Related papers (2025-03-01T07:02:30Z) - S3M: Scalable Statistical Shape Modeling through Unsupervised
Correspondences [91.48841778012782]
We propose an unsupervised method to simultaneously learn local and global shape structures across population anatomies.
Our pipeline significantly improves unsupervised correspondence estimation for SSMs compared to baseline methods.
Our method is robust enough to learn from noisy neural network predictions, potentially enabling scaling SSMs to larger patient populations.
arXiv Detail & Related papers (2023-04-15T09:39:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.