Auto-US: An Ultrasound Video Diagnosis Agent Using Video Classification Framework and LLMs
- URL: http://arxiv.org/abs/2511.07748v1
- Date: Wed, 12 Nov 2025 01:14:40 GMT
- Title: Auto-US: An Ultrasound Video Diagnosis Agent Using Video Classification Framework and LLMs
- Authors: Yuezhe Yang, Yiyue Guo, Wenjie Cai, Qingqing Ruan, Siying Wang, Xingbo Dong, Zhe Jin, Yong Dai,
- Abstract summary: We propose textbfAuto-US, an intelligent diagnosis agent that integrates ultrasound video data with clinical diagnostic text.<n>We developed textbfCTU-Net, which achieves state-of-the-art performance in ultrasound video classification, reaching an accuracy of 86.73%.<n>These results demonstrate the effectiveness and clinical potential of Auto-US in real-world ultrasound applications.
- Score: 13.37674307639552
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: AI-assisted ultrasound video diagnosis presents new opportunities to enhance the efficiency and accuracy of medical imaging analysis. However, existing research remains limited in terms of dataset diversity, diagnostic performance, and clinical applicability. In this study, we propose \textbf{Auto-US}, an intelligent diagnosis agent that integrates ultrasound video data with clinical diagnostic text. To support this, we constructed \textbf{CUV Dataset} of 495 ultrasound videos spanning five categories and three organs, aggregated from multiple open-access sources. We developed \textbf{CTU-Net}, which achieves state-of-the-art performance in ultrasound video classification, reaching an accuracy of 86.73\% Furthermore, by incorporating large language models, Auto-US is capable of generating clinically meaningful diagnostic suggestions. The final diagnostic scores for each case exceeded 3 out of 5 and were validated by professional clinicians. These results demonstrate the effectiveness and clinical potential of Auto-US in real-world ultrasound applications. Code and data are available at: https://github.com/Bean-Young/Auto-US.
Related papers
- FETAL-GAUGE: A Benchmark for Assessing Vision-Language Models in Fetal Ultrasound [2.8097961263689406]
The demand for prenatal ultrasound imaging has intensified a global shortage of trained sonographers.<n>Deep learning has the potential to enhance sonographers' efficiency and support the training of new practitioners.<n>We present Fetal-Gauge, the first and largest visual question answering benchmark specifically designed to evaluate Vision-Language Models (VLMs)<n>Our benchmark comprises over 42,000 images and 93,000 question-answer pairs, spanning anatomical plane identification, visual grounding of anatomical structures, fetal orientation assessment, clinical view conformity, and clinical diagnosis.
arXiv Detail & Related papers (2025-12-25T04:54:37Z) - Evolving Diagnostic Agents in a Virtual Clinical Environment [75.59389103511559]
We present a framework for training large language models (LLMs) as diagnostic agents with reinforcement learning.<n>Our method acquires diagnostic strategies through interactive exploration and outcome-based feedback.<n>DiagAgent significantly outperforms 10 state-of-the-art LLMs, including DeepSeek-v3 and GPT-4o.
arXiv Detail & Related papers (2025-10-28T17:19:47Z) - Timely Clinical Diagnosis through Active Test Selection [49.091903570068155]
We propose ACTMED (Adaptive Clinical Test selection via Model-based Experimental Design) to better emulate real-world diagnostic reasoning.<n>LLMs act as flexible simulators, generating plausible patient state distributions and supporting belief updates without requiring structured, task-specific training data.<n>We evaluate ACTMED on real-world datasets and show it can optimize test selection to improve diagnostic accuracy, interpretability, and resource use.
arXiv Detail & Related papers (2025-10-21T18:10:45Z) - EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence [9.731550105507457]
We propose EchoVLM, a vision-language model specifically designed for ultrasound medical imaging.<n>The model employs a Mixture of Experts (MoE) architecture trained on data spanning seven anatomical regions.<n>EchoVLM achieved significant improvements of 10.15 and 4.77 points in BLEU-1 scores and ROUGE-1 scores respectively.
arXiv Detail & Related papers (2025-09-18T14:07:53Z) - A Fully Open and Generalizable Foundation Model for Ultrasound Clinical Applications [77.3888788549565]
We present EchoCare, a novel ultrasound foundation model for generalist clinical use.<n>We developed EchoCare via self-supervised learning on our curated, publicly available, large-scale dataset EchoCareData.<n>With minimal training, EchoCare outperforms state-of-the-art comparison models across 10 representative ultrasound benchmarks.
arXiv Detail & Related papers (2025-09-15T10:05:31Z) - PRECISE-AS: Personalized Reinforcement Learning for Efficient Point-of-Care Echocardiography in Aortic Stenosis Diagnosis [6.276251898178271]
Aortic stenosis (AS) is a life-threatening condition caused by a narrowing of the aortic valve, leading to impaired blood flow.<n>Access to echocardiography (echo) is often limited due to resource constraints, particularly in rural and underserved areas.<n>We propose a reinforcement learning (RL)-driven active video acquisition framework that dynamically selects each patient's most informative echo videos.
arXiv Detail & Related papers (2025-09-02T23:47:43Z) - MedVQA-TREE: A Multimodal Reasoning and Retrieval Framework for Sarcopenia Prediction [1.7775777785480917]
We propose MedVQA-TREE, a framework that integrates a hierarchical image interpretation module, a gated feature-level fusion mechanism, and a novel multi-hop, multi-query retrieval strategy.<n>A gated fusion mechanism selectively integrates visual features with textual queries, while clinical knowledge is retrieved through a UMLS-guided pipeline accessing PubMed and a sarcopenia-specific external knowledge base.<n>The model achieved up to 99% diagnostic accuracy and outperformed previous state-of-the-art methods by over 10%.
arXiv Detail & Related papers (2025-08-26T13:31:01Z) - Privacy-Preserving Federated Foundation Model for Generalist Ultrasound Artificial Intelligence [83.02106623401885]
We present UltraFedFM, an innovative privacy-preserving ultrasound foundation model.
UltraFedFM is collaboratively pre-trained using federated learning across 16 distributed medical institutions in 9 countries.
It achieves an average area under the receiver operating characteristic curve of 0.927 for disease diagnosis and a dice similarity coefficient of 0.878 for lesion segmentation.
arXiv Detail & Related papers (2024-11-25T13:40:11Z) - A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning [11.817595076396925]
Diagnostic Captioning (DC) automatically generates a diagnostic text from one or more medical images of a patient.
We propose a new data-driven guided decoding method that incorporates medical information into the beam search of the diagnostic text generation process.
We evaluate the proposed method on two medical datasets using four DC systems that range from generic image-to-text systems with CNN encoders to pre-trained Large Language Models.
arXiv Detail & Related papers (2024-06-20T10:08:17Z) - CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images.
The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism.
We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z) - Show from Tell: Audio-Visual Modelling in Clinical Settings [58.88175583465277]
We consider audio-visual modelling in a clinical setting, providing a solution to learn medical representations without human expert annotation.
A simple yet effective multi-modal self-supervised learning framework is proposed for this purpose.
The proposed approach is able to localise anatomical regions of interest during ultrasound imaging, with only speech audio as a reference.
arXiv Detail & Related papers (2023-10-25T08:55:48Z) - Multi-Modal Active Learning for Automatic Liver Fibrosis Diagnosis based
on Ultrasound Shear Wave Elastography [13.13249599000645]
Noninvasive diagnosis like ultrasound (US) imaging plays a very important role in automatic liver fibrosis diagnosis (ALFD)
Due to the noisy data, expensive annotations of US images, the application of Artificial Intelligence (AI) assisting approaches encounters a bottleneck.
In this work, we innovatively propose a multi-modal fusion network with active learning (MMFN-AL) for ALFD to exploit the information of multiple modalities.
arXiv Detail & Related papers (2020-11-02T03:05:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.