Multi-Modal Foundation Models for Computational Pathology: A Survey
- URL: http://arxiv.org/abs/2503.09091v2
- Date: Thu, 20 Mar 2025 16:43:54 GMT
- Title: Multi-Modal Foundation Models for Computational Pathology: A Survey
- Authors: Dong Li, Guihong Wan, Xintao Wu, Xinyu Wu, Xiaohui Chen, Yi He, Christine G. Lian, Peter K. Sorger, Yevgeniy R. Semenov, Chen Zhao,
- Abstract summary: Foundation models have emerged as a powerful paradigm in computational pathology (CPath)<n>We categorize 32 state-of-the-art multi-modal foundation models into three major paradigms: vision-language, vision-knowledge graph, and vision-gene expression.<n>We analyze 28 available multi-modal datasets tailored for pathology, grouped into image-text pairs, instruction datasets, and image-other modality pairs.
- Score: 32.25958653387204
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Foundation models have emerged as a powerful paradigm in computational pathology (CPath), enabling scalable and generalizable analysis of histopathological images. While early developments centered on uni-modal models trained solely on visual data, recent advances have highlighted the promise of multi-modal foundation models that integrate heterogeneous data sources such as textual reports, structured domain knowledge, and molecular profiles. In this survey, we provide a comprehensive and up-to-date review of multi-modal foundation models in CPath, with a particular focus on models built upon hematoxylin and eosin (H&E) stained whole slide images (WSIs) and tile-level representations. We categorize 32 state-of-the-art multi-modal foundation models into three major paradigms: vision-language, vision-knowledge graph, and vision-gene expression. We further divide vision-language models into non-LLM-based and LLM-based approaches. Additionally, we analyze 28 available multi-modal datasets tailored for pathology, grouped into image-text pairs, instruction datasets, and image-other modality pairs. Our survey also presents a taxonomy of downstream tasks, highlights training and evaluation strategies, and identifies key challenges and future directions. We aim for this survey to serve as a valuable resource for researchers and practitioners working at the intersection of pathology and AI.
Related papers
- Biomedical Foundation Model: A Survey [84.26268124754792]
Foundation models are large-scale pre-trained models that learn from extensive unlabeled datasets.<n>These models can be adapted to various applications such as question answering and visual understanding.<n>This survey explores the potential of foundation models across diverse domains within biomedical fields.
arXiv Detail & Related papers (2025-03-03T22:42:00Z) - A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models [74.48084001058672]
The rise of foundation models has transformed machine learning research.<n> multimodal foundation models (MMFMs) pose unique interpretability challenges beyond unimodal frameworks.<n>This survey explores two key aspects: (1) the adaptation of LLM interpretability methods to multimodal models and (2) understanding the mechanistic differences between unimodal language models and crossmodal systems.
arXiv Detail & Related papers (2025-02-22T20:55:26Z) - A Survey on Computational Pathology Foundation Models: Datasets, Adaptation Strategies, and Evaluation Tasks [22.806228975730008]
Computational pathology foundation models (CPathFMs) have emerged as a powerful approach for analyzing histological data.<n>These models have demonstrated promise in automating complex pathology tasks such as segmentation, classification, and biomarker discovery.<n>However, the development of CPathFMs presents significant challenges, such as limited data accessibility, high variability across datasets, and lack of standardized evaluation benchmarks.
arXiv Detail & Related papers (2025-01-27T01:27:59Z) - CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology [17.781388341968967]
CPath- Omni is the first LMM designed to unify both patch and WSI level image analysis.<n>CPath- Omni achieves state-of-the-art (SOTA) performance across seven diverse tasks on 39 out of 42 datasets.<n>CPath-CLIP, for the first time, integrates different vision models and incorporates a large language model as a text encoder to build a more powerful CLIP model.
arXiv Detail & Related papers (2024-12-16T18:46:58Z) - Autoregressive Models in Vision: A Survey [119.23742136065307]
This survey comprehensively examines the literature on autoregressive models applied to vision.
We divide visual autoregressive models into three general sub-categories, including pixel-based, token-based, and scale-based models.
We present a multi-faceted categorization of autoregressive models in computer vision, including image generation, video generation, 3D generation, and multi-modal generation.
arXiv Detail & Related papers (2024-11-08T17:15:12Z) - How Good Are We? Evaluating Cell AI Foundation Models in Kidney Pathology with Human-in-the-Loop Enrichment [11.60167559546617]
Training AI foundation models have emerged as a promising large-scale learning approach for addressing real-world healthcare challenges.
While many of these models have been developed for tasks like disease diagnosis and tissue quantification, their readiness for deployment on some arguably simplest tasks, such as nuclei segmentation within a single organ, remains uncertain.
This paper seeks to answer this key question, "How good are we?" by thoroughly evaluating the performance of recent cell foundation models on a curated dataset.
arXiv Detail & Related papers (2024-10-31T17:00:33Z) - A Survey for Foundation Models in Autonomous Driving [10.315409708116865]
Large language models contribute to planning and simulation in autonomous driving.
vision foundation models are increasingly adapted for critical tasks such as 3D object detection and tracking.
Multi-modal foundation models, integrating diverse inputs, exhibit exceptional visual understanding and spatial reasoning.
arXiv Detail & Related papers (2024-02-02T02:44:59Z) - Recognizing Identities From Human Skeletons: A Survey on 3D Skeleton Based Person Re-Identification [60.939250172443586]
Person re-identification via 3D skeletons is an important emerging research area that attracts increasing attention within the pattern recognition community.
We provide a comprehensive review and analysis of recent SRID advances.
A thorough evaluation of state-of-the-art SRID methods is conducted over various types of benchmarks and protocols to compare their effectiveness and efficiency.
arXiv Detail & Related papers (2024-01-27T04:52:24Z) - Foundational Models in Medical Imaging: A Comprehensive Survey and
Future Vision [6.2847894163744105]
Foundation models are large-scale, pre-trained deep-learning models adapted to a wide range of downstream tasks.
These models facilitate contextual reasoning, generalization, and prompt capabilities at test time.
Capitalizing on the advances in computer vision, medical imaging has also marked a growing interest in these models.
arXiv Detail & Related papers (2023-10-28T12:08:12Z) - Graph Foundation Models: Concepts, Opportunities and Challenges [66.37994863159861]
Foundation models have emerged as critical components in a variety of artificial intelligence applications.<n>The capabilities of foundation models in generalization and adaptation motivate graph machine learning researchers to discuss the potential of developing a new graph learning paradigm.<n>This article introduces the concept of Graph Foundation Models (GFMs), and offers an exhaustive explanation of their key characteristics and underlying technologies.
arXiv Detail & Related papers (2023-10-18T09:31:21Z) - Multimodal Foundation Models: From Specialists to General-Purpose
Assistants [187.72038587829223]
The research landscape encompasses five core topics, categorized into two classes.
The target audiences of the paper are researchers, graduate students, and professionals in computer vision and vision-language multimodal communities.
arXiv Detail & Related papers (2023-09-18T17:56:28Z) - Geometric Deep Learning for Structure-Based Drug Design: A Survey [83.87489798671155]
Structure-based drug design (SBDD) leverages the three-dimensional geometry of proteins to identify potential drug candidates.
Recent advancements in geometric deep learning, which effectively integrate and process 3D geometric data, have significantly propelled the field forward.
arXiv Detail & Related papers (2023-06-20T14:21:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.