Related papers: Comparing Computational Pathology Foundation Models using Representational Similarity Analysis

Comparing Computational Pathology Foundation Models using Representational Similarity Analysis

URL: http://arxiv.org/abs/2509.15482v2
Date: Wed, 05 Nov 2025 20:38:54 GMT
Title: Comparing Computational Pathology Foundation Models using Representational Similarity Analysis
Authors: Vaibhav Mishra, William Lotter,
Abstract summary: We systematically analyze the representational spaces of six CPath foundation models using techniques in computational neuroscience.<n>We find that UNI2 and Virchow2 have the most distinct representational structures, whereas Prov-Gigapath has the highest average similarity across models.
Score: 0.6660458629649825
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Foundation models are increasingly developed in computational pathology (CPath) given their promise in facilitating many downstream tasks. While recent studies have evaluated task performance across models, less is known about the structure and variability of their learned representations. Here, we systematically analyze the representational spaces of six CPath foundation models using techniques popularized in computational neuroscience. The models analyzed span vision-language contrastive learning (CONCH, PLIP, KEEP) and self-distillation (UNI (v2), Virchow (v2), Prov-GigaPath) approaches. Through representational similarity analysis using H&E image patches from TCGA, we find that UNI2 and Virchow2 have the most distinct representational structures, whereas Prov-Gigapath has the highest average similarity across models. Having the same training paradigm (vision-only vs. vision-language) did not guarantee higher representational similarity. The representations of all models showed a high slide-dependence, but relatively low disease-dependence. Stain normalization decreased slide-dependence for all models by a range of 5.5% (CONCH) to 20.5% (PLIP). In terms of intrinsic dimensionality, vision-language models demonstrated relatively compact representations, compared to the more distributed representations of vision-only models. These findings highlight opportunities to improve robustness to slide-specific features, inform model ensembling strategies, and provide insights into how training paradigms shape model representations. Our framework is extendable across medical imaging domains, where probing the internal representations of foundation models can support their effective development and deployment.

Related papers

Benchmarking Computational Pathology Foundation Models For Semantic Segmentation [0.0]
Concatenating features from CONCH, PathDino and CellViT outperformed individual models across all the datasets by 7.95%.<n>We show that the vision language foundation model, CONCH performed the best across datasets when compared to vision-only foundation models.
arXiv Detail & Related papers (2026-02-21T08:00:46Z)
Learning Diffusion Models with Flexible Representation Guidance [37.301580601018365]
We present a systematic framework for incorporating representation guidance into diffusion models.<n>We introduce two new strategies for enhancing representation alignment in diffusion models.<n>Experiments across image, protein sequence, and molecule generation tasks demonstrate superior performance as well as accelerated training.
arXiv Detail & Related papers (2025-07-11T19:29:02Z)
Linking Robustness and Generalization: A k* Distribution Analysis of Concept Clustering in Latent Space for Vision Models [56.89974470863207]
This article uses the k* Distribution, a local neighborhood analysis method, to examine the learned latent space at the level of individual concepts. We introduce skewness-based true and approximate metrics for interpreting individual concepts to assess the overall quality of vision models' latent space.
arXiv Detail & Related papers (2024-08-17T01:43:51Z)
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception [66.88792390480343]
We propose DEEM, a simple but effective approach that utilizes the generative feedback of diffusion models to align the semantic distributions of the image encoder.<n>DEEM exhibits enhanced robustness and a superior capacity to alleviate model hallucinations while utilizing fewer trainable parameters, less pre-training data, and a smaller base model size.
arXiv Detail & Related papers (2024-05-24T05:46:04Z)
The Importance of Downstream Networks in Digital Pathology Foundation Models [1.689369173057502]
We evaluate seven feature extractor models across three different datasets with 162 different aggregation model configurations. We find that the performance of many current feature extractor models is notably similar.
arXiv Detail & Related papers (2023-11-29T16:54:25Z)
UniDiff: Advancing Vision-Language Models with Generative and Discriminative Learning [86.91893533388628]
This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC) UniDiff demonstrates versatility in both multi-modal understanding and generative tasks.
arXiv Detail & Related papers (2023-06-01T15:39:38Z)
On the Compositional Generalization Gap of In-Context Learning [73.09193595292233]
We look at the gap between the in-distribution (ID) and out-of-distribution (OOD) performance of such models in semantic parsing tasks with in-context learning. We evaluate four model families, OPT, BLOOM, CodeGen and Codex on three semantic parsing datasets.
arXiv Detail & Related papers (2022-11-15T19:56:37Z)
IMACS: Image Model Attribution Comparison Summaries [16.80986701058596]
We introduce IMACS, a method that combines gradient-based model attributions with aggregation and visualization techniques. IMACS extracts salient input features from an evaluation dataset, clusters them based on similarity, then visualizes differences in model attributions for similar input features. We show how our technique can uncover behavioral differences caused by domain shift between two models trained on satellite images.
arXiv Detail & Related papers (2022-01-26T21:35:14Z)
Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors. We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method. Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.