Related papers: CPathAgent: An Agent-based Foundation Model for Interpretable High-Resolution Pathology Image Analysis Mimicking Pathologists' Diagnostic Logic

CPathAgent: An Agent-based Foundation Model for Interpretable High-Resolution Pathology Image Analysis Mimicking Pathologists' Diagnostic Logic

URL: http://arxiv.org/abs/2505.20510v2
Date: Tue, 28 Oct 2025 09:39:55 GMT
Title: CPathAgent: An Agent-based Foundation Model for Interpretable High-Resolution Pathology Image Analysis Mimicking Pathologists' Diagnostic Logic
Authors: Yuxuan Sun, Yixuan Si, Chenglu Zhu, Kai Zhang, Zhongyi Shui, Bowen Ding, Tao Lin, Lin Yang,
Abstract summary: We introduce CPathAgent, an innovative agent-based approach that mimics pathologists' diagnostic workflow.<n>We develop a multi-stage training strategy that unifies patch-level, region-level, and WSI-level capabilities within a single model.<n>We construct PathMMU-HR2, the first expert-validated benchmark for large region analysis.
Score: 23.488576700623966
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in computational pathology have led to the emergence of numerous foundation models. These models typically rely on general-purpose encoders with multi-instance learning for whole slide image (WSI) classification or apply multimodal approaches to generate reports directly from images. However, these models cannot emulate the diagnostic approach of pathologists, who systematically examine slides at low magnification to obtain an overview before progressively zooming in on suspicious regions to formulate comprehensive diagnoses. Instead, existing models directly output final diagnoses without revealing the underlying reasoning process. To address this gap, we introduce CPathAgent, an innovative agent-based approach that mimics pathologists' diagnostic workflow by autonomously navigating across WSI based on observed visual features, thereby generating substantially more transparent and interpretable diagnostic summaries. To achieve this, we develop a multi-stage training strategy that unifies patch-level, region-level, and WSI-level capabilities within a single model, which is essential for replicating how pathologists understand and reason across diverse image scales. Additionally, we construct PathMMU-HR2, the first expert-validated benchmark for large region analysis. This represents a critical intermediate scale between patches and whole slides, reflecting a key clinical reality where pathologists typically examine several key large regions rather than entire slides at once. Extensive experiments demonstrate that CPathAgent consistently outperforms existing approaches across benchmarks at three different image scales, validating the effectiveness of our agent-based diagnostic approach and highlighting a promising direction for computational pathology.

Related papers

MedAD-R1: Eliciting Consistent Reasoning in Interpretible Medical Anomaly Detection via Consistency-Reinforced Policy Optimization [46.65200216642429]
We introduce MedAD-38K, the first large-scale, multi-modal, and multi-center benchmark for MedAD featuring diagnostic Chain-of-Thought (CoT) annotations alongside structured Visual Question-Answering (VQA) pairs.<n>Our proposed model, MedAD-R1, achieves state-of-the-art (SOTA) performance on the MedAD-38K benchmark, outperforming strong baselines by more than 10%.
arXiv Detail & Related papers (2026-02-01T07:56:10Z)
PathFound: An Agentic Multimodal Model Activating Evidence-seeking Pathological Diagnosis [13.503111478218434]
PathFound is an agentic multimodal model designed to support evidence-seeking inference in pathological diagnosis.<n> PathFound achieves state-of-the-art diagnostic performance across diverse clinical scenarios.
arXiv Detail & Related papers (2025-12-29T15:34:27Z)
PathAgent: Toward Interpretable Analysis of Whole-slide Pathology Images via Large Language Model-based Agentic Reasoning [17.067199015601954]
We present PathAgent, a training-free, large language model (LLM)-based agent framework that emulates the reflective, stepwise analytical approach of human experts.<n>The entire sequence of observations and decisions forms an explicit chain-of-thought, yielding fully interpretable predictions.
arXiv Detail & Related papers (2025-11-21T08:50:14Z)
CLAPS: A CLIP-Unified Auto-Prompt Segmentation for Multi-Modal Retinal Imaging [47.04292769940597]
We propose CLIP-unified Auto-Prompt (CLAPS), a novel method for unified segmentation across diverse tasks and modalities in retinal imaging.<n>Our approach begins by pre-training a CLIP-based image encoder on a large, multi-modal retinal dataset.<n>To unify tasks and resolve ambiguity, we use text prompts enhanced with a unique "modality signature" for each imaging modality.
arXiv Detail & Related papers (2025-09-10T14:14:49Z)
PathMR: Multimodal Visual Reasoning for Interpretable Pathology Diagnosis [9.728322291979564]
We propose PathMR, a cell-level Multimodal visual Reasoning framework for Pathological image analysis.<n>We show that PathMR consistently outperforms state-of-the-art visual reasoning methods in text generation quality, segmentation accuracy, and cross-modal alignment.
arXiv Detail & Related papers (2025-08-28T14:46:24Z)
Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation [56.52520416420957]
We propose Multimodal Causal-Driven Representation Learning (MCDRL) to tackle domain generalization in medical image segmentation.<n>MCDRL consistently outperforms competing methods, yielding superior segmentation accuracy and exhibiting robust generalizability.
arXiv Detail & Related papers (2025-08-07T03:41:41Z)
GRASPing Anatomy to Improve Pathology Segmentation [67.98147643529309]
We introduce GRASP, a modular plug-and-play framework that enhances pathology segmentation models.<n>We evaluate GRASP on two PET/CT datasets, conduct systematic ablation studies, and investigate the framework's inner workings.
arXiv Detail & Related papers (2025-08-05T12:26:36Z)
Evidence-based diagnostic reasoning with multi-agent copilot for human pathology [7.976907866539546]
Current multimodal large language models (MLLMs) in computational pathology face limitations.<n>We introduce PathChat+, a new MLLM specifically designed for human pathology, trained on over 1 million diverse, pathology-specific instruction samples.<n>We also present SlideSeek, a reasoning-enabled multi-agent AI system leveraging PathChat+ to autonomously evaluate gigapixel whole-slide images.
arXiv Detail & Related papers (2025-06-26T03:02:16Z)
PathSegDiff: Pathology Segmentation using Diffusion model representations [63.20694440934692]
We propose PathSegDiff, a novel approach for histopathology image segmentation that leverages Latent Diffusion Models (LDMs) as pre-trained featured extractors.<n>Our method utilizes a pathology-specific LDM, guided by a self-supervised encoder, to extract rich semantic information from H&E stained histopathology images.<n>Our experiments demonstrate significant improvements over traditional methods on the BCSS and GlaS datasets.
arXiv Detail & Related papers (2025-04-09T14:58:21Z)
A Survey of Pathology Foundation Model: Progress and Future Directions [3.009351592961681]
Computational pathology involves analyzing whole slide images for automated cancer diagnosis.<n>Recent Pathology Foundation Models (PFMs), pretrained on large-scale histopathology data, have significantly enhanced both the extractor and aggregator.<n>We present a hierarchical taxonomy organizing PFMs through a top-down philosophy applicable to foundation model analysis in any domain.
arXiv Detail & Related papers (2025-04-05T03:44:09Z)
Efficient and Comprehensive Feature Extraction in Large Vision-Language Model for Pathology Analysis [37.11302829771659]
Large vision-language models (LVLMs) are limited by input resolution constraints, hindering their efficiency and accuracy in pathology image analysis.<n>We propose two innovative strategies: the mixed task-guided feature enhancement, and the prompt-guided detail feature completion.<n>We trained the pathology-specialized LVLM, OmniPath, which significantly outperforms existing methods in diagnostic accuracy and efficiency.
arXiv Detail & Related papers (2024-12-12T18:07:23Z)
Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective [32.93871326428446]
Recent advances in artificial intelligence (AI) are revolutionizing medical imaging and computational pathology.<n>A constant challenge in the analysis of digital Whole Slide Images (WSIs) is the problem of aggregating tens of thousands of tile-level image embeddings to a slide-level representation.<n>This study conducts a benchmarking analysis of ten slide-level aggregation techniques across nine clinically relevant tasks.
arXiv Detail & Related papers (2024-07-10T17:00:57Z)
Anatomy-guided Pathology Segmentation [56.883822515800205]
We develop a generalist segmentation model that combines anatomical and pathological information, aiming to enhance the segmentation accuracy of pathological features. Our Anatomy-Pathology Exchange (APEx) training utilizes a query-based segmentation transformer which decodes a joint feature space into query-representations for human anatomy. In doing so, we are able to report the best results across the board on FDG-PET-CT and Chest X-Ray pathology segmentation tasks with a margin of up to 3.3% as compared to strong baseline methods.
arXiv Detail & Related papers (2024-07-08T11:44:15Z)
PathoWAve: A Deep Learning-based Weight Averaging Method for Improving Domain Generalization in Histopathology Images [13.362177469092963]
We introduce Pathology Weight Averaging (PathoWAve) to tackle domain shift phenomenon in histopathology image analysis. Our results on Camelyon17 WILDS dataset demonstrate PathoWAve's superiority over previous proposed methods.
arXiv Detail & Related papers (2024-06-21T23:25:44Z)
Validating polyp and instrument segmentation methods in colonoscopy through Medico 2020 and MedAI 2021 Challenges [58.32937972322058]
"Medico automatic polyp segmentation (Medico 2020)" and "MedAI: Transparency in Medical Image (MedAI 2021)" competitions. We present a comprehensive summary and analyze each contribution, highlight the strength of the best-performing methods, and discuss the possibility of clinical translations of such methods into the clinic.
arXiv Detail & Related papers (2023-07-30T16:08:45Z)
PathAsst: A Generative Foundation AI Assistant Towards Artificial General Intelligence of Pathology [15.419350834457136]
We present PathAsst, a multimodal generative foundation AI assistant to revolutionize diagnostic and predictive analytics in pathology. The development of PathAsst involves three pivotal steps: data acquisition, CLIP model adaptation, and the training of PathAsst's multimodal generative capabilities. The experimental results of PathAsst show the potential of harnessing AI-powered generative foundation model to improve pathology diagnosis and treatment processes.
arXiv Detail & Related papers (2023-05-24T11:55:50Z)
Inheritance-guided Hierarchical Assignment for Clinical Automatic Diagnosis [50.15205065710629]
Clinical diagnosis, which aims to assign diagnosis codes for a patient based on the clinical note, plays an essential role in clinical decision-making. We propose a novel framework to combine the inheritance-guided hierarchical assignment and co-occurrence graph propagation for clinical automatic diagnosis.
arXiv Detail & Related papers (2021-01-27T13:16:51Z)
Few-shot Medical Image Segmentation using a Global Correlation Network with Discriminative Embedding [60.89561661441736]
We propose a novel method for few-shot medical image segmentation. We construct our few-shot image segmentor using a deep convolutional network trained episodically. We enhance discriminability of deep embedding to encourage clustering of the feature domains of the same class.
arXiv Detail & Related papers (2020-12-10T04:01:07Z)
Shape-aware Meta-learning for Generalizing Prostate MRI Segmentation to Unseen Domains [68.73614619875814]
We present a novel shape-aware meta-learning scheme to improve the model generalization in prostate MRI segmentation. Experimental results show that our approach outperforms many state-of-the-art generalization methods consistently across all six settings of unseen domains.
arXiv Detail & Related papers (2020-07-04T07:56:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.