Act Like a Pathologist: Tissue-Aware Whole Slide Image Reasoning
- URL: http://arxiv.org/abs/2603.00667v1
- Date: Sat, 28 Feb 2026 14:22:53 GMT
- Title: Act Like a Pathologist: Tissue-Aware Whole Slide Image Reasoning
- Authors: Wentao Huang, Weimin Lyu, Peiliang Lou, Qingqiao Hu, Xiaoling Hu, Shahira Abousamra, Wenchao Han, Ruifeng Guo, Jiawei Zhou, Chao Chen, Chen Wang,
- Abstract summary: We propose a question-guided, tissue-aware, and coarse-to-fine retrieval framework, HistoSelect.<n>Our approach outperforms existing methods and produces answers grounded in interpretable, pathologist-consistent regions.<n>Our results suggest that bringing human-like search and attention patterns into WSI reasoning is a promising direction for building practical and reliable pathology VLMs.
- Score: 21.809404751735503
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Computational pathology has advanced rapidly in recent years, driven by domain-specific image encoders and growing interest in using vision-language models to answer natural-language questions about diseases. Yet, the core problem behind pathology question-answering remains unsolved, considering that a gigapixel slide contains far more information than necessary for a given question. Pathologists naturally navigate tissue and morphology complexity by scanning broadly, and zooming in selectively according to the clinical questions. Current models, in contrast, rely on uniform patch sampling or broad attention maps, often attending equally to irrelevant regions while overlooking key visual evidence. In this work, we try to bring models closer to how humans actually examine slides. We propose a question-guided, tissue-aware, and coarse-to-fine retrieval framework, HistoSelect, that consists of two key components: a group sampler that identifies question-relevant tissue regions, followed by a patch selector that retrieves the most informative patches within those regions. By selecting only the most informative patches, our method becomes significantly more efficient: reducing visual token usage by 70% on average, while improving accuracy across three pathology QA tasks. Evaluated on 356,000 question-answer pairs, our approach outperforms existing methods and produces answers grounded in interpretable, pathologist-consistent regions. Our results suggest that bringing human-like search and attention patterns into WSI reasoning is a promising direction for building practical and reliable pathology VLMs.
Related papers
- A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis [82.01597026329158]
We introduce a Correlation-Regulated Alignment Framework for Tissue Synthesis (CRAFTS) for pathology-specific text-to-image synthesis.<n>CRAFTS incorporates a novel alignment mechanism that suppresses semantic drift to ensure biological accuracy.<n>This model generates diverse pathological images spanning 30 cancer types, with quality rigorously validated by objective metrics and pathologist evaluations.
arXiv Detail & Related papers (2025-12-15T10:22:43Z) - PathReasoning: A multimodal reasoning agent for query-based ROI navigation on whole-slide images [12.145046046646215]
We propose "PathReasoning", a multi-modal reasoning agent that iteratively navigates across Whole Slide Images (WSIs)<n>PathReasoning builds a reasoning chain that gradually directs attention to diagnostically relevant areas.<n>It can substantially outperform strong ROI-selection approaches by 6.7% and 3.1% of AUROC on subtyping and longitudinal analysis tasks.
arXiv Detail & Related papers (2025-11-26T20:44:17Z) - Accurate and Scalable Multimodal Pathology Retrieval via Attentive Vision-Language Alignment [25.320017572772553]
We present PathSearch, a retrieval framework that unifies fine-grained attentive mosaic representations with global-wise slide embeddings aligned through vision-language contrastive learning.<n>Trained on a corpus of 6,926 slide-report pairs, PathSearch captures both fine-grained morphological cues and high-level semantic patterns to enable accurate and flexible retrieval.<n>PathSearch was rigorously evaluated on four public pathology datasets and three in-house cohorts, covering tasks including anatomical site retrieval, tumor subtyping, tumor vs. non-tumor discrimination, and grading across diverse organs such as breast, lung, kidney, liver, and stomach.
arXiv Detail & Related papers (2025-10-27T11:22:28Z) - Patho-AgenticRAG: Towards Multimodal Agentic Retrieval-Augmented Generation for Pathology VLMs via Reinforcement Learning [9.075284970935341]
Patho-AgenticRAG is a database built on page-level embeddings from authoritative pathology textbooks.<n>It supports joint text-image search, enabling direct retrieval of textbook pages that contain both the queried text and relevant visual cues.<n>Patho-AgenticRAG significantly outperforms existing multimodal models in complex pathology tasks like multiple-choice diagnosis and visual question answering.
arXiv Detail & Related papers (2025-08-04T10:03:08Z) - A Graph-Based Framework for Interpretable Whole Slide Image Analysis [86.37618055724441]
We develop a framework that transforms whole-slide images into biologically-informed graph representations.<n>Our approach builds graph nodes from tissue regions that respect natural structures, not arbitrary grids.<n>We demonstrate strong performance on challenging cancer staging and survival prediction tasks.
arXiv Detail & Related papers (2025-03-14T20:15:04Z) - Pathological Prior-Guided Multiple Instance Learning For Mitigating Catastrophic Forgetting in Breast Cancer Whole Slide Image Classification [50.899861205016265]
We propose a new framework PaGMIL to mitigate catastrophic forgetting in breast cancer WSI classification.<n>Our framework introduces two key components into the common MIL model architecture.<n>We evaluate the continual learning performance of PaGMIL across several public breast cancer datasets.
arXiv Detail & Related papers (2025-03-08T04:51:58Z) - HistoGym: A Reinforcement Learning Environment for Histopathological Image Analysis [9.615399811006034]
HistoGym aims to foster whole slide image diagnosis by mimicking the real-life processes of doctors.
We offer various scenarios for different organs and cancers, including both WSI-based and selected region-based scenarios.
arXiv Detail & Related papers (2024-08-16T17:19:07Z) - FMDNN: A Fuzzy-guided Multi-granular Deep Neural Network for Histopathological Image Classification [40.94024666952439]
We propose the Fuzzy-guided Multi-granularity Deep Neural Network (FMDNN)
Inspired by the multi-granular diagnostic approach of pathologists, we perform feature extraction on cell structures at coarse, medium, and fine granularity.
A fuzzy-guided cross-attention module guides universal fuzzy features toward multi-granular features.
arXiv Detail & Related papers (2024-07-22T00:46:15Z) - Knowledge-enhanced Visual-Language Pretraining for Computational Pathology [68.6831438330526]
We consider the problem of visual representation learning for computational pathology, by exploiting large-scale image-text pairs gathered from public resources.
We curate a pathology knowledge tree that consists of 50,470 informative attributes for 4,718 diseases requiring pathology diagnosis from 32 human tissues.
arXiv Detail & Related papers (2024-04-15T17:11:25Z) - RudolfV: A Foundation Model by Pathologists for Pathologists [13.17203220753175]
We present a novel approach to designing foundation models for computational pathology.
Our model "RudolfV" surpasses existing state-of-the-art foundation models across different benchmarks.
arXiv Detail & Related papers (2024-01-08T18:31:38Z) - Active Learning Enhances Classification of Histopathology Whole Slide
Images with Attention-based Multiple Instance Learning [48.02011627390706]
We train an attention-based MIL and calculate a confidence metric for every image in the dataset to select the most uncertain WSIs for expert annotation.
With a novel attention guiding loss, this leads to an accuracy boost of the trained models with few regions annotated for each class.
It may in the future serve as an important contribution to train MIL models in the clinically relevant context of cancer classification in histopathology.
arXiv Detail & Related papers (2023-03-02T15:18:58Z) - Unsupervised deep learning techniques for powdery mildew recognition
based on multispectral imaging [63.62764375279861]
This paper presents a deep learning approach to automatically recognize powdery mildew on cucumber leaves.
We focus on unsupervised deep learning techniques applied to multispectral imaging data.
We propose the use of autoencoder architectures to investigate two strategies for disease detection.
arXiv Detail & Related papers (2021-12-20T13:29:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.