Boosting Pathology Foundation Models via Few-shot Prompt-tuning for Rare Cancer Subtyping
- URL: http://arxiv.org/abs/2508.15904v1
- Date: Thu, 21 Aug 2025 18:04:41 GMT
- Title: Boosting Pathology Foundation Models via Few-shot Prompt-tuning for Rare Cancer Subtyping
- Authors: Dexuan He, Xiao Zhou, Wenbin Guan, Liyuan Zhang, Xiaoman Zhang, Sinuo Xu, Ge Wang, Lifeng Wang, Xiaojun Yuan, Xin Sun, Yanfeng Wang, Kun Sun, Ya Zhang, Weidi Xie,
- Abstract summary: We propose PathPT, a novel framework that exploits the potential of vision-language pathology foundation models.<n>PathPT converts WSI-level supervision into fine-grained tile-level guidance by leveraging the zero-shot capabilities of VL models.<n>Results show that PathPT consistently delivers superior performance, achieving substantial gains in subtyping accuracy and cancerous region grounding ability.
- Score: 80.92960114162746
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Rare cancers comprise 20-25% of all malignancies but face major diagnostic challenges due to limited expert availability-especially in pediatric oncology, where they represent over 70% of cases. While pathology vision-language (VL) foundation models show promising zero-shot capabilities for common cancer subtyping, their clinical performance for rare cancers remains limited. Existing multi-instance learning (MIL) methods rely only on visual features, overlooking cross-modal knowledge and compromising interpretability critical for rare cancer diagnosis. To address this limitation, we propose PathPT, a novel framework that fully exploits the potential of vision-language pathology foundation models through spatially-aware visual aggregation and task-specific prompt tuning. Unlike conventional MIL, PathPT converts WSI-level supervision into fine-grained tile-level guidance by leveraging the zero-shot capabilities of VL models, thereby preserving localization on cancerous regions and enabling cross-modal reasoning through prompts aligned with histopathological semantics. We benchmark PathPT on eight rare cancer datasets(four adult and four pediatric) spanning 56 subtypes and 2,910 WSIs, as well as three common cancer datasets, evaluating four state-of-the-art VL models and four MIL frameworks under three few-shot settings. Results show that PathPT consistently delivers superior performance, achieving substantial gains in subtyping accuracy and cancerous region grounding ability. This work advances AI-assisted diagnosis for rare cancers, offering a scalable solution for improving subtyping accuracy in settings with limited access to specialized expertise.
Related papers
- From Generic to Specialized: A Subspecialty Diagnostic System Powered by Self-Supervised Learning for Cervical Histopathology [29.378512559906977]
We introduce the Cervical Sub-Path (CerS-Path) diagnostic system.<n>Self-supervised learning on 190 million tissue patches from 140,000 slides to build a cervical-specific feature extractor.<n> multimodal enhancement with 2.5 million image-text pairs, followed by integration with multiple downstream diagnostic functions.
arXiv Detail & Related papers (2025-10-11T12:22:35Z) - Beyond Diagnosis: Evaluating Multimodal LLMs for Pathology Localization in Chest Radiographs [33.80781505782195]
We evaluate two general-purpose large language models (LLMs) and a domain-specific model (MedGemma) in their ability to localize pathologies on chest radiographs.<n>GPT-5 exhibited a localization accuracy of 49.7%, followed by GPT-4 (39.1%) and MedGemma (17.7%), all lower than a task-specific CNN baseline (59.9%) and a radiologist benchmark (80.1%)<n>GPT-4 performed well on pathologies with fixed anatomical locations, but struggled with spatially variable findings and exhibited implausible predictions more frequently.
arXiv Detail & Related papers (2025-09-22T16:54:23Z) - CXR-LT 2024: A MICCAI challenge on long-tailed, multi-label, and zero-shot disease classification from chest X-ray [64.2434525370243]
The CXR-LT series is a community-driven initiative designed to enhance lung disease classification using chest X-rays.<n>The CXR-LT 2024 expands the dataset to 377,110 chest X-rays (CXRs) and 45 disease labels, including 19 new rare disease findings.<n>This paper provides an overview of CXR-LT 2024, detailing the data curation process and consolidating state-of-the-art solutions.
arXiv Detail & Related papers (2025-06-09T17:53:31Z) - GRAPHITE: Graph-Based Interpretable Tissue Examination for Enhanced Explainability in Breast Cancer Histopathology [14.812589661794592]
GRAPHITE is a post-hoc explainable framework designed for breast cancer tissue microarray (TMA) analysis.<n>We trained the model on 140 tumour TMA cores and four benign whole slide images from which 140 benign samples were created, and tested it on 53 pathologist-annotated TMA samples.<n>It achieved a mean average precision (mAP) of 0.56, an area under the receiver operating characteristic curve (AUROC) of 0.94, and a threshold robustness (ThR) of 0.70.
arXiv Detail & Related papers (2025-01-08T00:54:43Z) - A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis [58.85247337449624]
We propose a knowledge-enhanced vision-language pre-training approach that integrates disease knowledge into the alignment within hierarchical semantic groups.<n>KEEP achieves state-of-the-art performance in zero-shot cancer diagnostic tasks.
arXiv Detail & Related papers (2024-12-17T17:45:21Z) - Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports [51.45762396192655]
Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence for computer vision.
This study evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets.
arXiv Detail & Related papers (2024-07-08T09:08:42Z) - Boosting Medical Image-based Cancer Detection via Text-guided Supervision from Reports [68.39938936308023]
We propose a novel text-guided learning method to achieve highly accurate cancer detection results.
Our approach can leverage clinical knowledge by large-scale pre-trained VLM to enhance generalization ability.
arXiv Detail & Related papers (2024-05-23T07:03:38Z) - Beyond attention: deriving biologically interpretable insights from
weakly-supervised multiple-instance learning models [2.639541396835675]
We introduce prediction-attention-weighted (PAW) maps by combining tile-level attention and prediction scores produced by a refined encoder.
We also introduce a biological feature instantiation technique by integrating PAW maps with nuclei segmentation masks.
Our approach reveals that regions that are predictive of adverse prognosis do not tend to co-locate with the tumour regions.
arXiv Detail & Related papers (2023-09-07T09:44:35Z) - WSSS4LUAD: Grand Challenge on Weakly-supervised Tissue Semantic
Segmentation for Lung Adenocarcinoma [51.50991881342181]
This challenge includes 10,091 patch-level annotations and over 130 million labeled pixels.
First place team achieved mIoU of 0.8413 (tumor: 0.8389, stroma: 0.7931, normal: 0.8919)
arXiv Detail & Related papers (2022-04-13T15:27:05Z) - Whole Slide Images are 2D Point Clouds: Context-Aware Survival
Prediction using Patch-based Graph Convolutional Networks [6.427108174481534]
We present Patch-GCN, a context-aware, spatially-resolved patch-based graph convolutional network that hierarchically aggregates instance-level histology features.
We demonstrate that Patch-GCN outperforms all prior weakly-supervised approaches by 3.58-9.46%.
arXiv Detail & Related papers (2021-07-27T19:17:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.