EchoAgent: Guideline-Centric Reasoning Agent for Echocardiography Measurement and Interpretation
- URL: http://arxiv.org/abs/2511.13948v1
- Date: Mon, 17 Nov 2025 22:06:12 GMT
- Title: EchoAgent: Guideline-Centric Reasoning Agent for Echocardiography Measurement and Interpretation
- Authors: Matin Daghyani, Lyuyang Wang, Nima Hashemi, Bassant Medhat, Baraa Abdelsamad, Eros Rojas Velez, XiaoXiao Li, Michael Y. C. Tsang, Christina Luong, Teresa S. M. Tsang, Purang Abolmaesumi,
- Abstract summary: EchoAgent is a framework that enables structured, interpretable automation for echocardiographic video analysis.<n>It orchestrates specialized vision tools under Large Language Model (LLM) control to perform temporal localization, spatial measurement, and clinical interpretation.<n>It achieves accurate, interpretable results despite added complexity oftemporal video analysis.
- Score: 23.197431495208672
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Purpose: Echocardiographic interpretation requires video-level reasoning and guideline-based measurement analysis, which current deep learning models for cardiac ultrasound do not support. We present EchoAgent, a framework that enables structured, interpretable automation for this domain. Methods: EchoAgent orchestrates specialized vision tools under Large Language Model (LLM) control to perform temporal localization, spatial measurement, and clinical interpretation. A key contribution is a measurement-feasibility prediction model that determines whether anatomical structures are reliably measurable in each frame, enabling autonomous tool selection. We curated a benchmark of diverse, clinically validated video-query pairs for evaluation. Results: EchoAgent achieves accurate, interpretable results despite added complexity of spatiotemporal video analysis. Outputs are grounded in visual evidence and clinical guidelines, supporting transparency and traceability. Conclusion: This work demonstrates the feasibility of agentic, guideline-aligned reasoning for echocardiographic video analysis, enabled by task-specific tools and full video-level automation. EchoAgent sets a new direction for trustworthy AI in cardiac ultrasound.
Related papers
- MedScope: Incentivizing "Think with Videos" for Clinical Reasoning via Coarse-to-Fine Tool Calling [51.31633278218137]
MedScope is a tool-using clinical video reasoning model that performs coarse-to-fine evidence seeking over long-form procedures.<n>We build ClinVideoSuite, an evidence-centric, fine-grained clinical video suite.<n>On full and fine-grained video understanding benchmarks, MedScope achieves state-of-the-art performance.
arXiv Detail & Related papers (2026-02-11T09:47:02Z) - Echo-CoPilot: A Multi-View, Multi-Task Agent for Echocardiography Interpretation and Reporting [8.162197738994479]
We introduce Echo-CoPilot, a multi-view, multi-task agent that uses a large language model to orchestrate specialized echocardiography tools.<n>Within a ReAct-style loop, the agent decomposes clinician queries, invokes tools for view recognition, cardiac structure segmentation, measurement and disease prediction, and report synthesis.<n>We evaluate Echo-CoPilot on the public MIMIC-EchoQA benchmark, where it achieves an accuracy of 50.8%, outperforming both general-purpose and biomedical video vision-language models.
arXiv Detail & Related papers (2025-12-06T23:27:54Z) - PathAgent: Toward Interpretable Analysis of Whole-slide Pathology Images via Large Language Model-based Agentic Reasoning [17.067199015601954]
We present PathAgent, a training-free, large language model (LLM)-based agent framework that emulates the reflective, stepwise analytical approach of human experts.<n>The entire sequence of observations and decisions forms an explicit chain-of-thought, yielding fully interpretable predictions.
arXiv Detail & Related papers (2025-11-21T08:50:14Z) - MIRNet: Integrating Constrained Graph-Based Reasoning with Pre-training for Diagnostic Medical Imaging [67.74482877175797]
MIRNet is a novel framework that integrates self-supervised pre-training with constrained graph-based reasoning.<n>We introduce TongueAtlas-4K, a benchmark comprising 4,000 images annotated with 22 diagnostic labels.
arXiv Detail & Related papers (2025-11-13T06:30:41Z) - Intelligent Healthcare Imaging Platform: A VLM-Based Framework for Automated Medical Image Analysis and Clinical Report Generation [0.0]
This work presents an intelligent multimodal framework for medical image analysis that leverages Vision-Language Models (VLMs)<n>The framework integrates Google Gemini 2.5 Flash for automated tumor detection and clinical report generation across multiple imaging modalities including CT, MRI, X-ray, and Ultrasound.
arXiv Detail & Related papers (2025-09-16T23:15:44Z) - Automated Clinical Problem Detection from SOAP Notes using a Collaborative Multi-Agent LLM Architecture [8.072932739333309]
We introduce a collaborative multi-agent system (MAS) that models a clinical consultation team to address this gap.<n>The system is tasked with identifying clinical problems by analyzing only the Subjective (S) and Objective (O) sections of SOAP notes.<n>A Manager agent orchestrates a dynamically assigned team of specialist agents who engage in a hierarchical, iterative debate to reach a consensus.
arXiv Detail & Related papers (2025-08-29T17:31:24Z) - AURA: A Multi-Modal Medical Agent for Understanding, Reasoning & Annotation [0.8397730500554048]
AURA is the first visual linguistic explainability agent designed specifically for comprehensive analysis, explanation, and evaluation of medical images.<n>AURA represents a significant advancement toward more transparent, adaptable, and clinically aligned AI systems.
arXiv Detail & Related papers (2025-07-22T18:24:18Z) - RadFabric: Agentic AI System with Reasoning Capability for Radiology [61.25593938175618]
RadFabric is a multi agent, multimodal reasoning framework that unifies visual and textual analysis for comprehensive CXR interpretation.<n>System employs specialized CXR agents for pathology detection, an Anatomical Interpretation Agent to map visual findings to precise anatomical structures, and a Reasoning Agent powered by large multimodal reasoning models to synthesize visual, anatomical, and clinical data into transparent and evidence based diagnoses.
arXiv Detail & Related papers (2025-06-17T03:10:33Z) - Acoustic to Articulatory Inversion of Speech; Data Driven Approaches, Challenges, Applications, and Future Scope [0.0]
This review is focused on the data-driven approaches applied in different applications of Acoustic-to-Articulatory Inversion (AAI) of speech.
arXiv Detail & Related papers (2025-04-17T19:38:50Z) - EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance [79.66329903007869]
We present EchoWorld, a motion-aware world modeling framework for probe guidance.<n>It encodes anatomical knowledge and motion-induced visual dynamics.<n>It is trained on more than one million ultrasound images from over 200 routine scans.
arXiv Detail & Related papers (2025-04-17T16:19:05Z) - Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [53.629132242389716]
Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions.
VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information.
We propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge.
arXiv Detail & Related papers (2024-05-29T23:19:28Z) - Show from Tell: Audio-Visual Modelling in Clinical Settings [58.88175583465277]
We consider audio-visual modelling in a clinical setting, providing a solution to learn medical representations without human expert annotation.
A simple yet effective multi-modal self-supervised learning framework is proposed for this purpose.
The proposed approach is able to localise anatomical regions of interest during ultrasound imaging, with only speech audio as a reference.
arXiv Detail & Related papers (2023-10-25T08:55:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.