Echo-CoPilot: A Multi-View, Multi-Task Agent for Echocardiography Interpretation and Reporting
- URL: http://arxiv.org/abs/2512.09944v1
- Date: Sat, 06 Dec 2025 23:27:54 GMT
- Title: Echo-CoPilot: A Multi-View, Multi-Task Agent for Echocardiography Interpretation and Reporting
- Authors: Moein Heidari, Mohammad Amin Roohi, Armin Khosravi, Ilker Hacihaliloglu,
- Abstract summary: We introduce Echo-CoPilot, a multi-view, multi-task agent that uses a large language model to orchestrate specialized echocardiography tools.<n>Within a ReAct-style loop, the agent decomposes clinician queries, invokes tools for view recognition, cardiac structure segmentation, measurement and disease prediction, and report synthesis.<n>We evaluate Echo-CoPilot on the public MIMIC-EchoQA benchmark, where it achieves an accuracy of 50.8%, outperforming both general-purpose and biomedical video vision-language models.
- Score: 8.162197738994479
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Echocardiography is central to contemporary cardiovascular care, but full-study interpretation remains a cognitively demanding, multi-view task that is still performed manually. While recent foundation models for echocardiography can achieve strong performance on individual perceptual subtasks such as view classification, segmentation, or disease prediction, they typically operate in isolation and do not provide a unified, clinically coherent assessment. In this work, we introduce Echo-CoPilot, a multi-view, multi-task agent that uses a large language model to orchestrate a suite of specialized echocardiography tools. Within a ReAct-style loop, the agent decomposes clinician queries, invokes tools for view recognition, cardiac structure segmentation, measurement and disease prediction, and report synthesis, and integrates their outputs into guideline-aware answers and narrative summaries. We evaluate Echo-CoPilot on the public MIMIC-EchoQA benchmark, where it achieves an accuracy of 50.8\%, outperforming both general-purpose and biomedical video vision-language models. Qualitative analyses further show that the agent leverages quantitative measurements and physiologic context to resolve challenging cases near clinical decision thresholds, such as borderline left ventricular hypertrophy or pericardial effusion severity. The code will be released upon acceptance of the paper.
Related papers
- AgentsEval: Clinically Faithful Evaluation of Medical Imaging Reports via Multi-Agent Reasoning [73.50200033931148]
We introduce AgentsEval, a multi-agent stream reasoning framework that emulates the collaborative diagnostic workflow of radiologists.<n>By dividing the evaluation process into interpretable steps including criteria definition, evidence extraction, alignment, and consistency scoring, AgentsEval provides explicit reasoning traces and structured clinical feedback.<n> Experimental results demonstrate that AgentsEval delivers clinically aligned, semantically faithful, and interpretable evaluations that remain robust under paraphrastic, semantic, and stylistic perturbations.
arXiv Detail & Related papers (2026-01-23T11:59:13Z) - EchoVLM: Measurement-Grounded Multimodal Learning for Echocardiography [19.10644729648278]
vision-language models (VLMs) have achieved broad success in natural images and certain medical domains.<n>We introduce EchoGround-MIMIC, the first measurement-grounded multimodal echocardiography dataset.<n>We propose EchoVLM, a vision-language model that incorporates two novel pretraining objectives.
arXiv Detail & Related papers (2025-12-13T00:48:31Z) - EchoAgent: Guideline-Centric Reasoning Agent for Echocardiography Measurement and Interpretation [23.197431495208672]
EchoAgent is a framework that enables structured, interpretable automation for echocardiographic video analysis.<n>It orchestrates specialized vision tools under Large Language Model (LLM) control to perform temporal localization, spatial measurement, and clinical interpretation.<n>It achieves accurate, interpretable results despite added complexity oftemporal video analysis.
arXiv Detail & Related papers (2025-11-17T22:06:12Z) - CardioBench: Do Echocardiography Foundation Models Generalize Beyond the Lab? [4.445909648625997]
Foundation models (FMs) are reshaping medical imaging, yet their application in echocardiography remains limited.<n>We introduce CardioBench, a comprehensive benchmark for echocardiography FMs.<n> CardioBench unifies eight publicly available datasets into a standardized suite spanning four regression and five classification tasks.
arXiv Detail & Related papers (2025-10-01T05:09:48Z) - Heartcare Suite: Multi-dimensional Understanding of ECG with Raw Multi-lead Signal Modeling [50.58126509704037]
Heartcare Suite is a framework for fine-grained electrocardiogram (ECG) understanding.<n>Heartcare-220K is a high-quality, structured, and comprehensive multimodal ECG dataset.<n>Heartcare-Bench is a benchmark to guide the optimization of Medical Multimodal Large Language Models (Med-MLLMs) in ECG scenarios.
arXiv Detail & Related papers (2025-06-06T07:56:41Z) - EchoApex: A General-Purpose Vision Foundation Model for Echocardiography [9.202542805578432]
We introduce EchoApex, the first general-purpose vision foundation model echocardiography with applications on a variety of clinical practice.
Leveraging self-supervised learning, EchoApex is pretrained on over 20 million echo images from 11 clinical centres.
Compared to state-of-the-art task-specific models, EchoApex attains improved performance with a unified image encoding architecture.
arXiv Detail & Related papers (2024-10-14T21:10:56Z) - EchoPrime: A Multi-Video View-Informed Vision-Language Model for Comprehensive Echocardiography Interpretation [1.0840985826142429]
We introduce EchoPrime, a multi-view, view-informed, video-based vision-language foundation model trained on over 12 million video-report pairs.
With retrieval-augmented interpretation, EchoPrime integrates information from all echocardiogram videos in a comprehensive study.
In datasets from two independent healthcare systems, EchoPrime achieves state-of-the art performance on 23 diverse benchmarks of cardiac form and function.
arXiv Detail & Related papers (2024-10-13T03:04:22Z) - Improving Out-of-Distribution Detection in Echocardiographic View
Classication through Enhancing Semantic Features [1.9892308483583199]
We introduce a novel use of label smoothing to enhance semantic feature representation in echocardiographic images.
By combining label smoothing with MD-based OOD detection, we establish a new benchmark for accuracy in echocardiographic OOD detection.
arXiv Detail & Related papers (2023-08-31T06:44:42Z) - Factored Attention and Embedding for Unstructured-view Topic-related
Ultrasound Report Generation [70.7778938191405]
We propose a novel factored attention and embedding model (termed FAE-Gen) for the unstructured-view topic-related ultrasound report generation.
The proposed FAE-Gen mainly consists of two modules, i.e., view-guided factored attention and topic-oriented factored embedding, which capture the homogeneous and heterogeneous morphological characteristic across different views.
arXiv Detail & Related papers (2022-03-12T15:24:03Z) - MyoPS: A Benchmark of Myocardial Pathology Segmentation Combining
Three-Sequence Cardiac Magnetic Resonance Images [84.02849948202116]
This work defines a new task of medical image analysis, i.e., to perform myocardial pathology segmentation (MyoPS)
MyoPS combines three-sequence cardiac magnetic resonance (CMR) images, which was first proposed in the MyoPS challenge, in conjunction with MICCAI 2020.
The challenge provided 45 paired and pre-aligned CMR images, allowing algorithms to combine the complementary information from the three CMR sequences for pathology segmentation.
arXiv Detail & Related papers (2022-01-10T06:37:23Z) - Generalized Organ Segmentation by Imitating One-shot Reasoning using
Anatomical Correlation [55.1248480381153]
We propose OrganNet which learns a generalized organ concept from a set of annotated organ classes and then transfer this concept to unseen classes.
We show that OrganNet can effectively resist the wide variations in organ morphology and produce state-of-the-art results in one-shot segmentation task.
arXiv Detail & Related papers (2021-03-30T13:41:12Z) - Few-shot Medical Image Segmentation using a Global Correlation Network
with Discriminative Embedding [60.89561661441736]
We propose a novel method for few-shot medical image segmentation.
We construct our few-shot image segmentor using a deep convolutional network trained episodically.
We enhance discriminability of deep embedding to encourage clustering of the feature domains of the same class.
arXiv Detail & Related papers (2020-12-10T04:01:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.