Accurate and Scalable Multimodal Pathology Retrieval via Attentive Vision-Language Alignment
- URL: http://arxiv.org/abs/2510.23224v1
- Date: Mon, 27 Oct 2025 11:22:28 GMT
- Title: Accurate and Scalable Multimodal Pathology Retrieval via Attentive Vision-Language Alignment
- Authors: Hongyi Wang, Zhengjie Zhu, Jiabo Ma, Fang Wang, Yue Shi, Bo Luo, Jili Wang, Qiuyu Cai, Xiuming Zhang, Yen-Wei Chen, Lanfen Lin, Hao Chen,
- Abstract summary: We present PathSearch, a retrieval framework that unifies fine-grained attentive mosaic representations with global-wise slide embeddings aligned through vision-language contrastive learning.<n>Trained on a corpus of 6,926 slide-report pairs, PathSearch captures both fine-grained morphological cues and high-level semantic patterns to enable accurate and flexible retrieval.<n>PathSearch was rigorously evaluated on four public pathology datasets and three in-house cohorts, covering tasks including anatomical site retrieval, tumor subtyping, tumor vs. non-tumor discrimination, and grading across diverse organs such as breast, lung, kidney, liver, and stomach.
- Score: 25.320017572772553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid digitization of histopathology slides has opened up new possibilities for computational tools in clinical and research workflows. Among these, content-based slide retrieval stands out, enabling pathologists to identify morphologically and semantically similar cases, thereby supporting precise diagnoses, enhancing consistency across observers, and assisting example-based education. However, effective retrieval of whole slide images (WSIs) remains challenging due to their gigapixel scale and the difficulty of capturing subtle semantic differences amid abundant irrelevant content. To overcome these challenges, we present PathSearch, a retrieval framework that unifies fine-grained attentive mosaic representations with global-wise slide embeddings aligned through vision-language contrastive learning. Trained on a corpus of 6,926 slide-report pairs, PathSearch captures both fine-grained morphological cues and high-level semantic patterns to enable accurate and flexible retrieval. The framework supports two key functionalities: (1) mosaic-based image-to-image retrieval, ensuring accurate and efficient slide research; and (2) multi-modal retrieval, where text queries can directly retrieve relevant slides. PathSearch was rigorously evaluated on four public pathology datasets and three in-house cohorts, covering tasks including anatomical site retrieval, tumor subtyping, tumor vs. non-tumor discrimination, and grading across diverse organs such as breast, lung, kidney, liver, and stomach. External results show that PathSearch outperforms traditional image-to-image retrieval frameworks. A multi-center reader study further demonstrates that PathSearch improves diagnostic accuracy, boosts confidence, and enhances inter-observer agreement among pathologists in real clinical scenarios. These results establish PathSearch as a scalable and generalizable retrieval solution for digital pathology.
Related papers
- Act Like a Pathologist: Tissue-Aware Whole Slide Image Reasoning [21.809404751735503]
We propose a question-guided, tissue-aware, and coarse-to-fine retrieval framework, HistoSelect.<n>Our approach outperforms existing methods and produces answers grounded in interpretable, pathologist-consistent regions.<n>Our results suggest that bringing human-like search and attention patterns into WSI reasoning is a promising direction for building practical and reliable pathology VLMs.
arXiv Detail & Related papers (2026-02-28T14:22:53Z) - A Graph-Based Framework for Interpretable Whole Slide Image Analysis [86.37618055724441]
We develop a framework that transforms whole-slide images into biologically-informed graph representations.<n>Our approach builds graph nodes from tissue regions that respect natural structures, not arbitrary grids.<n>We demonstrate strong performance on challenging cancer staging and survival prediction tasks.
arXiv Detail & Related papers (2025-03-14T20:15:04Z) - RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining [64.66825253356869]
We propose a novel methodology that leverages dense radiology reports to define image-wise similarity ordering at multiple granularities.<n>We construct two comprehensive medical imaging retrieval datasets: MIMIC-IR for Chest X-rays and CTRATE-IR for CT scans.<n>We develop two retrieval systems, RadIR-CXR and model-ChestCT, which demonstrate superior performance in traditional image-image and image-report retrieval tasks.
arXiv Detail & Related papers (2025-03-06T17:43:03Z) - On Image Search in Histopathology [0.0]
We review the latest developments in image search technologies for histopathology.
We offer a concise overview tailored for computational pathology researchers seeking effective, fast and efficient image search methods in their work.
arXiv Detail & Related papers (2024-01-14T12:38:49Z) - Towards a Visual-Language Foundation Model for Computational Pathology [5.72536252929528]
We introduce CONtrastive learning from Captions for Histopathology (CONCH)
CONCH is a visual-language foundation model developed using diverse sources of histopathology images, biomedical text, and task-agnostic pretraining.
It is evaluated on a suite of 13 diverse benchmarks, achieving state-of-the-art performance on histology image classification, segmentation, captioning, text-to-image and image-to-text retrieval.
arXiv Detail & Related papers (2023-07-24T16:13:43Z) - Region-based Contrastive Pretraining for Medical Image Retrieval with
Anatomic Query [56.54255735943497]
Region-based contrastive pretraining for Medical Image Retrieval (RegionMIR)
We introduce a novel Region-based contrastive pretraining for Medical Image Retrieval (RegionMIR)
arXiv Detail & Related papers (2023-05-09T16:46:33Z) - Multimorbidity Content-Based Medical Image Retrieval Using Proxies [37.47987844057842]
We propose a novel multi-label metric learning method that can be used for both classification and content-based image retrieval.
Our model is able to support diagnosis by predicting the presence of diseases and provide evidence for these predictions.
We demonstrate the efficacy of our approach to both classification and content-based image retrieval on two multimorbidity radiology datasets.
arXiv Detail & Related papers (2022-11-22T11:23:53Z) - A Deep Reinforcement Learning Framework for Rapid Diagnosis of Whole
Slide Pathological Images [4.501311544043762]
We propose a weakly supervised deep reinforcement learning framework, which can greatly reduce the time required for network inference.
We use neural network to construct the search model and decision model of reinforcement learning agent respectively.
Experimental results show that our proposed method can achieve fast inference and accurate prediction of whole slide images without any pixel-level annotations.
arXiv Detail & Related papers (2022-05-05T14:20:29Z) - Semantic segmentation of multispectral photoacoustic images using deep
learning [53.65837038435433]
Photoacoustic imaging has the potential to revolutionise healthcare.
Clinical translation of the technology requires conversion of the high-dimensional acquired data into clinically relevant and interpretable information.
We present a deep learning-based approach to semantic segmentation of multispectral photoacoustic images.
arXiv Detail & Related papers (2021-05-20T09:33:55Z) - Few-shot Medical Image Segmentation using a Global Correlation Network
with Discriminative Embedding [60.89561661441736]
We propose a novel method for few-shot medical image segmentation.
We construct our few-shot image segmentor using a deep convolutional network trained episodically.
We enhance discriminability of deep embedding to encourage clustering of the feature domains of the same class.
arXiv Detail & Related papers (2020-12-10T04:01:07Z) - Robust Medical Instrument Segmentation Challenge 2019 [56.148440125599905]
Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions.
Our challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures.
The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap.
arXiv Detail & Related papers (2020-03-23T14:35:08Z) - VerSe: A Vertebrae Labelling and Segmentation Benchmark for
Multi-detector CT Images [121.31355003451152]
Large Scale Vertebrae Challenge (VerSe) was organised in conjunction with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) in 2019 and 2020.
We present the the results of this evaluation and further investigate the performance-variation at vertebra-level, scan-level, and at different fields-of-view.
arXiv Detail & Related papers (2020-01-24T21:09:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.