Towards Spatial Transcriptomics-driven Pathology Foundation Models
- URL: http://arxiv.org/abs/2602.14177v1
- Date: Sun, 15 Feb 2026 15:06:45 GMT
- Title: Towards Spatial Transcriptomics-driven Pathology Foundation Models
- Authors: Konstantin Hemker, Andrew H. Song, Cristina Almagro-Pérez, Guillaume Jaume, Sophia J. Wagner, Anurag Vaidya, Nikola Simidjievski, Mateja Jamnik, Faisal Mahmood,
- Abstract summary: We introduce a vision-omics self-supervised learning framework that infuses localized molecular information into pathology vision encoders.<n>We instantiate SEAL by training on over 700,000 paired gene expression spot-tissue region examples spanning tumor and normal samples from 14 organs.<n> SEAL encoders exhibit robust domain generalization on out-of-distribution and enable new cross-modal capabilities such as gene-to-image retrieval.
- Score: 32.70436266943553
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Spatial transcriptomics (ST) provides spatially resolved measurements of gene expression, enabling characterization of the molecular landscape of human tissue beyond histological assessment as well as localized readouts that can be aligned with morphology. Concurrently, the success of multimodal foundation models that integrate vision with complementary modalities suggests that morphomolecular coupling between local expression and morphology can be systematically used to improve histological representations themselves. We introduce Spatial Expression-Aligned Learning (SEAL), a vision-omics self-supervised learning framework that infuses localized molecular information into pathology vision encoders. Rather than training new encoders from scratch, SEAL is designed as a parameter-efficient vision-omics finetuning method that can be flexibly applied to widely used pathology foundation models. We instantiate SEAL by training on over 700,000 paired gene expression spot-tissue region examples spanning tumor and normal samples from 14 organs. Tested across 38 slide-level and 15 patch-level downstream tasks, SEAL provides a drop-in replacement for pathology foundation models that consistently improves performance over widely used vision-only and ST prediction baselines on slide-level molecular status, pathway activity, and treatment response prediction, as well as patch-level gene expression prediction tasks. Additionally, SEAL encoders exhibit robust domain generalization on out-of-distribution evaluations and enable new cross-modal capabilities such as gene-to-image retrieval. Our work proposes a general framework for ST-guided finetuning of pathology foundation models, showing that augmenting existing models with localized molecular supervision is an effective and practical step for improving visual representations and expanding their cross-modal utility.
Related papers
- CARE: A Molecular-Guided Foundation Model with Adaptive Region Modeling for Whole Slide Image Analysis [23.45449218534003]
We present Cross-modal Adaptive Region (CARE), a foundation model for pathology that automatically partitions whole-slide images into morphologically relevant regions.<n>Based on only one-tenth of the pretraining data typically used by mainstream foundation models, CARE achieves superior average performance across 33 downstream benchmarks.
arXiv Detail & Related papers (2026-02-25T07:01:54Z) - Fusing Pixels and Genes: Spatially-Aware Learning in Computational Pathology [46.83014413674925]
STAMP is a spatial transcriptomics-augmented multimodal pathology representation learning framework.<n>Our study shows that self-supervised, gene-guided training provides a robust and task-agnostic signal for learning pathology image representations.<n>We validate STAMP across six datasets and four downstream tasks, where it consistently achieves strong performance.
arXiv Detail & Related papers (2026-02-15T00:59:13Z) - Investigating the Impact of Histopathological Foundation Models on Regressive Prediction of Homologous Recombination Deficiency [52.50039435394964]
We systematically evaluate foundation models for regression-based tasks.<n>We extract patch-level features from whole slide images (WSI) using five state-of-the-art foundation models.<n>Models are trained to predict continuous HRD scores based on these extracted features across breast, endometrial, and lung cancer cohorts.
arXiv Detail & Related papers (2026-01-29T14:06:50Z) - Do Pathology Foundation Models Encode Disease Progression? A Pseudotime Analysis of Visual Representations [0.0]
We show vision foundation models can implicitly learn to represent continuous processes from independent static observations.<n>This framework could be applied to other domains where continuous processes are observed through static snapshots.
arXiv Detail & Related papers (2026-01-29T06:50:43Z) - SAGE-FM: A lightweight and interpretable spatial transcriptomics foundation model [2.060288975192133]
We introduce SAGE-FM, a lightweight spatial transcriptomics foundation model based on graph convolutional networks (GCNs)<n>Trained on 416 human Visium samples spanning 15 organs, SAGE-FM learns spatially coherent embeddings that robustly recover masked genes.<n>Results demonstrate that simple, parameter-efficient GCNs can serve as biologically interpretable and spatially aware foundation models for large-scale spatial transcriptomics.
arXiv Detail & Related papers (2026-01-21T22:22:38Z) - A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis [82.01597026329158]
We introduce a Correlation-Regulated Alignment Framework for Tissue Synthesis (CRAFTS) for pathology-specific text-to-image synthesis.<n>CRAFTS incorporates a novel alignment mechanism that suppresses semantic drift to ensure biological accuracy.<n>This model generates diverse pathological images spanning 30 cancer types, with quality rigorously validated by objective metrics and pathologist evaluations.
arXiv Detail & Related papers (2025-12-15T10:22:43Z) - Self-Supervised Anatomical Consistency Learning for Vision-Grounded Medical Report Generation [61.350584471060756]
Vision-grounded medical report generation aims to produce clinically accurate descriptions of medical images.<n>We propose Self-Supervised Anatomical Consistency Learning (SS-ACL) to align generated reports with corresponding anatomical regions.<n>SS-ACL constructs a hierarchical anatomical graph inspired by the invariant top-down inclusion structure of human anatomy.
arXiv Detail & Related papers (2025-09-30T08:59:06Z) - PathoHR: Hierarchical Reasoning for Vision-Language Models in Pathology [3.459714932882085]
Current vision-language (VL) models often struggle to capture the complex reasoning required for interpreting structured pathological reports.<n>We propose PathoHR-Bench, a novel benchmark designed to evaluate VL models' abilities in hierarchical semantic understanding and compositional reasoning within the pathology domain.<n>We further introduce a pathology-specific VL training scheme that generates enhanced and perturbed samples for multimodal contrastive learning.
arXiv Detail & Related papers (2025-09-07T15:42:38Z) - AdaFusion: Prompt-Guided Inference with Adaptive Fusion of Pathology Foundation Models [49.550545038402184]
We propose AdaFusion, a novel prompt-guided inference framework.<n>Our method compresses and aligns tile-level features from diverse models.<n>AdaFusion consistently surpasses individual PFMs across both classification and regression tasks.
arXiv Detail & Related papers (2025-08-07T07:09:31Z) - Integrating Pathology Foundation Models and Spatial Transcriptomics for Cellular Decomposition from Histology Images [0.0]
We propose a lightweight and training-efficient approach to predict cellular composition directly from histology images.<n>By training a lightweight multi-layer perceptron (MLP) regressor on cell-type abundances derived via cell2location, our method efficiently distills knowledge from pathology foundation models.
arXiv Detail & Related papers (2025-07-09T16:43:04Z) - Teaching pathology foundation models to accurately predict gene expression with parameter efficient knowledge transfer [1.5416321520529301]
Efficient Knowledge Adaptation (PEKA) is a novel framework that integrates knowledge distillation and structure alignment losses for cross-modal knowledge transfer.<n>We evaluated PEKA for gene expression prediction using multiple spatial transcriptomics datasets.
arXiv Detail & Related papers (2025-04-09T17:24:41Z) - MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention [57.044719143401664]
Histopathology and transcriptomics are fundamental modalities in oncology, encapsulating the morphological and molecular aspects of the disease.<n>We present MIRROR, a novel multi-modal representation learning method designed to foster both modality alignment and retention.<n>Extensive evaluations on TCGA cohorts for cancer subtyping and survival analysis highlight MIRROR's superior performance.
arXiv Detail & Related papers (2025-03-01T07:02:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.