Related papers: PathFound: An Agentic Multimodal Model Activating Evidence-seeking Pathological Diagnosis

PathFound: An Agentic Multimodal Model Activating Evidence-seeking Pathological Diagnosis

URL: http://arxiv.org/abs/2512.23545v1
Date: Mon, 29 Dec 2025 15:34:27 GMT
Title: PathFound: An Agentic Multimodal Model Activating Evidence-seeking Pathological Diagnosis
Authors: Shengyi Hua, Jianfeng Wu, Tianle Shen, Kangzhe Hu, Zhongzhen Huang, Shujuan Ni, Zhihong Zhang, Yuan Li, Zhe Wang, Xiaofan Zhang,
Abstract summary: PathFound is an agentic multimodal model designed to support evidence-seeking inference in pathological diagnosis.<n> PathFound achieves state-of-the-art diagnostic performance across diverse clinical scenarios.
Score: 13.503111478218434
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recent pathological foundation models have substantially advanced visual representation learning and multimodal interaction. However, most models still rely on a static inference paradigm in which whole-slide images are processed once to produce predictions, without reassessment or targeted evidence acquisition under ambiguous diagnoses. This contrasts with clinical diagnostic workflows that refine hypotheses through repeated slide observations and further examination requests. We propose PathFound, an agentic multimodal model designed to support evidence-seeking inference in pathological diagnosis. PathFound integrates the power of pathological visual foundation models, vision-language models, and reasoning models trained with reinforcement learning to perform proactive information acquisition and diagnosis refinement by progressing through the initial diagnosis, evidence-seeking, and final decision stages. Across several large multimodal models, adopting this strategy consistently improves diagnostic accuracy, indicating the effectiveness of evidence-seeking workflows in computational pathology. Among these models, PathFound achieves state-of-the-art diagnostic performance across diverse clinical scenarios and demonstrates strong potential to discover subtle details, such as nuclear features and local invasions.

Related papers

AgentsEval: Clinically Faithful Evaluation of Medical Imaging Reports via Multi-Agent Reasoning [73.50200033931148]
We introduce AgentsEval, a multi-agent stream reasoning framework that emulates the collaborative diagnostic workflow of radiologists.<n>By dividing the evaluation process into interpretable steps including criteria definition, evidence extraction, alignment, and consistency scoring, AgentsEval provides explicit reasoning traces and structured clinical feedback.<n> Experimental results demonstrate that AgentsEval delivers clinically aligned, semantically faithful, and interpretable evaluations that remain robust under paraphrastic, semantic, and stylistic perturbations.
arXiv Detail & Related papers (2026-01-23T11:59:13Z)
Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models [51.91760712805404]
We introduce VivaBench, a benchmark for evaluating sequential clinical reasoning in large language models (LLMs)<n>Our dataset consists of 1762 physician-curated clinical vignettes structured as interactive scenarios that simulate a (oral) examination in medical training.<n>Our analysis identified several failure modes that mirror common cognitive errors in clinical practice.
arXiv Detail & Related papers (2025-10-11T16:24:35Z)
RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis [56.373297358647655]
Retrieval-Augmented Diagnosis (RAD) is a novel framework that injects external knowledge into multimodal models directly on downstream tasks.<n>RAD operates through three key mechanisms: retrieval and refinement of disease-centered knowledge from multiple medical sources, a guideline-enhanced contrastive loss transformer, and a dual decoder.
arXiv Detail & Related papers (2025-09-24T10:36:14Z)
PathoHR: Hierarchical Reasoning for Vision-Language Models in Pathology [3.459714932882085]
Current vision-language (VL) models often struggle to capture the complex reasoning required for interpreting structured pathological reports.<n>We propose PathoHR-Bench, a novel benchmark designed to evaluate VL models' abilities in hierarchical semantic understanding and compositional reasoning within the pathology domain.<n>We further introduce a pathology-specific VL training scheme that generates enhanced and perturbed samples for multimodal contrastive learning.
arXiv Detail & Related papers (2025-09-07T15:42:38Z)
DiagR1: A Vision-Language Model Trained via Reinforcement Learning for Digestive Pathology Diagnosis [7.5173141954286775]
We construct a large scale gastrointestinal pathology dataset containing both microscopic descriptions and diagnostic conclusions.<n>This design guides the model to better capture image specific features and maintain semantic consistency in generation.<n>Our solution outperforms state of the art models with 18.7% higher clinical relevance, 32.4% improved structural completeness, and 41.2% fewer diagnostic errors.
arXiv Detail & Related papers (2025-07-24T14:12:20Z)
CPathAgent: An Agent-based Foundation Model for Interpretable High-Resolution Pathology Image Analysis Mimicking Pathologists' Diagnostic Logic [23.488576700623966]
We introduce CPathAgent, an innovative agent-based approach that mimics pathologists' diagnostic workflow.<n>We develop a multi-stage training strategy that unifies patch-level, region-level, and WSI-level capabilities within a single model.<n>We construct PathMMU-HR2, the first expert-validated benchmark for large region analysis.
arXiv Detail & Related papers (2025-05-26T20:22:19Z)
Empowering Medical Multi-Agents with Clinical Consultation Flow for Dynamic Diagnosis [20.59719567178192]
We propose a multi-agent framework inspired by consultation flow and reinforcement learning (RL) to simulate the entire consultation process.<n>Our approach incorporates a hierarchical action set, structured from clinic consultation flow and medical textbook, to effectively guide the decision-making process.<n>This strategy improves agent interactions, enabling them to adapt and optimize actions based on the dynamic state.
arXiv Detail & Related papers (2025-03-19T08:47:18Z)
Self-Explaining Hypergraph Neural Networks for Diagnosis Prediction [45.89562183034469]
Existing deep learning diagnosis prediction models with intrinsic interpretability often assign attention weights to every past diagnosis or hospital visit.<n>We introduce SHy, a self-explaining hypergraph neural network model, designed to offer personalized, concise and faithful explanations.<n> SHy captures higher-order disease interactions and extracts distinct temporal phenotypes as personalized explanations.
arXiv Detail & Related papers (2025-02-15T06:33:02Z)
Continually Evolved Multimodal Foundation Models for Cancer Prognosis [50.43145292874533]
Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates.<n>Previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information.<n>Existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals.<n>Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities.
arXiv Detail & Related papers (2025-01-30T06:49:57Z)
Efficient and Comprehensive Feature Extraction in Large Vision-Language Model for Pathology Analysis [37.11302829771659]
Large vision-language models (LVLMs) are limited by input resolution constraints, hindering their efficiency and accuracy in pathology image analysis.<n>We propose two innovative strategies: the mixed task-guided feature enhancement, and the prompt-guided detail feature completion.<n>We trained the pathology-specialized LVLM, OmniPath, which significantly outperforms existing methods in diagnostic accuracy and efficiency.
arXiv Detail & Related papers (2024-12-12T18:07:23Z)
A Transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics [63.106382317917344]
We report a Transformer-based representation-learning model as a clinical diagnostic aid that processes multimodal input in a unified manner. The unified model outperformed an image-only model and non-unified multimodal diagnosis models in the identification of pulmonary diseases.
arXiv Detail & Related papers (2023-06-01T16:23:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.