A Fully Open and Generalizable Foundation Model for Ultrasound Clinical Applications
- URL: http://arxiv.org/abs/2509.11752v1
- Date: Mon, 15 Sep 2025 10:05:31 GMT
- Title: A Fully Open and Generalizable Foundation Model for Ultrasound Clinical Applications
- Authors: Hongyuan Zhang, Yuheng Wu, Mingyang Zhao, Zhiwei Chen, Rebecca Li, Fei Zhu, Haohan Zhao, Xiaohua Yuan, Meng Yang, Chunli Qiu, Xiang Cong, Haiyan Chen, Lina Luan, Randolph H. L. Wong, Huai Liao, Colin A Graham, Shi Chang, Guowei Tao, Dong Yi, Zhen Lei, Nassir Navab, Sebastien Ourselin, Jiebo Luo, Hongbin Liu, Gaofeng Meng,
- Abstract summary: We present EchoCare, a novel ultrasound foundation model for generalist clinical use.<n>We developed EchoCare via self-supervised learning on our curated, publicly available, large-scale dataset EchoCareData.<n>With minimal training, EchoCare outperforms state-of-the-art comparison models across 10 representative ultrasound benchmarks.
- Score: 77.3888788549565
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Artificial intelligence (AI) that can effectively learn ultrasound representations by integrating multi-source data holds significant promise for advancing clinical care. However, the scarcity of large labeled datasets in real-world clinical environments and the limited generalizability of task-specific models have hindered the development of generalizable clinical AI models for ultrasound applications. In this study, we present EchoCare, a novel ultrasound foundation model for generalist clinical use, developed via self-supervised learning on our curated, publicly available, large-scale dataset EchoCareData. EchoCareData comprises 4.5 million ultrasound images, sourced from over 23 countries across 5 continents and acquired via a diverse range of distinct imaging devices, thus encompassing global cohorts that are multi-center, multi-device, and multi-ethnic. Unlike prior studies that adopt off-the-shelf vision foundation model architectures, we introduce a hierarchical classifier into EchoCare to enable joint learning of pixel-level and representation-level features, capturing both global anatomical contexts and local ultrasound characteristics. With minimal training, EchoCare outperforms state-of-the-art comparison models across 10 representative ultrasound benchmarks of varying diagnostic difficulties, spanning disease diagnosis, lesion segmentation, organ detection, landmark prediction, quantitative regression, imaging enhancement and report generation. The code and pretrained model are publicly released, rendering EchoCare accessible for fine-tuning and local adaptation, supporting extensibility to additional applications. EchoCare provides a fully open and generalizable foundation model to boost the development of AI technologies for diverse clinical ultrasound applications.
Related papers
- FETAL-GAUGE: A Benchmark for Assessing Vision-Language Models in Fetal Ultrasound [2.8097961263689406]
The demand for prenatal ultrasound imaging has intensified a global shortage of trained sonographers.<n>Deep learning has the potential to enhance sonographers' efficiency and support the training of new practitioners.<n>We present Fetal-Gauge, the first and largest visual question answering benchmark specifically designed to evaluate Vision-Language Models (VLMs)<n>Our benchmark comprises over 42,000 images and 93,000 question-answer pairs, spanning anatomical plane identification, visual grounding of anatomical structures, fetal orientation assessment, clinical view conformity, and clinical diagnosis.
arXiv Detail & Related papers (2025-12-25T04:54:37Z) - Auto-US: An Ultrasound Video Diagnosis Agent Using Video Classification Framework and LLMs [13.37674307639552]
We propose textbfAuto-US, an intelligent diagnosis agent that integrates ultrasound video data with clinical diagnostic text.<n>We developed textbfCTU-Net, which achieves state-of-the-art performance in ultrasound video classification, reaching an accuracy of 86.73%.<n>These results demonstrate the effectiveness and clinical potential of Auto-US in real-world ultrasound applications.
arXiv Detail & Related papers (2025-11-11T02:00:56Z) - Epistemic-aware Vision-Language Foundation Model for Fetal Ultrasound Interpretation [83.02147613524032]
We introduce FetalMind, a medical AI system tailored to fetal ultrasound for both report generation and diagnosis.<n>We propose Salient Epistemic Disentanglement (SED), which injects an expert-curated bipartite graph into the model to decouple view-disease associations.<n>FetalMind outperforms open- and closed-source baselines across all gestational stages, achieving +14% average gains and +61.2% higher accuracy on critical conditions.
arXiv Detail & Related papers (2025-10-14T19:57:03Z) - EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence [9.731550105507457]
We propose EchoVLM, a vision-language model specifically designed for ultrasound medical imaging.<n>The model employs a Mixture of Experts (MoE) architecture trained on data spanning seven anatomical regions.<n>EchoVLM achieved significant improvements of 10.15 and 4.77 points in BLEU-1 scores and ROUGE-1 scores respectively.
arXiv Detail & Related papers (2025-09-18T14:07:53Z) - UltraEar: a multicentric, large-scale database combining ultra-high-resolution computed tomography and clinical data for ear diseases [28.75872046719716]
UltraEar recruits patients from 11 tertiary hospitals between October 2020 and October 2035.<n>UltraEar recruits patients from 11 tertiary hospitals between October 2020 and October 2035.<n>A broad spectrum of otologic disorders is covered, such as otitis media, cholatoma, ossicular chain malformation, temporal bone fracture, inner ear malformation, cochlear aperture stenosis, enlarged vestibular aqueduct, and sigmoid sinus bony deficiency.
arXiv Detail & Related papers (2025-08-27T05:56:17Z) - Privacy-Preserving Federated Foundation Model for Generalist Ultrasound Artificial Intelligence [83.02106623401885]
We present UltraFedFM, an innovative privacy-preserving ultrasound foundation model.
UltraFedFM is collaboratively pre-trained using federated learning across 16 distributed medical institutions in 9 countries.
It achieves an average area under the receiver operating characteristic curve of 0.927 for disease diagnosis and a dice similarity coefficient of 0.878 for lesion segmentation.
arXiv Detail & Related papers (2024-11-25T13:40:11Z) - EchoApex: A General-Purpose Vision Foundation Model for Echocardiography [9.202542805578432]
We introduce EchoApex, the first general-purpose vision foundation model echocardiography with applications on a variety of clinical practice.
Leveraging self-supervised learning, EchoApex is pretrained on over 20 million echo images from 11 clinical centres.
Compared to state-of-the-art task-specific models, EchoApex attains improved performance with a unified image encoding architecture.
arXiv Detail & Related papers (2024-10-14T21:10:56Z) - UniUSNet: A Promptable Framework for Universal Ultrasound Disease Prediction and Tissue Segmentation [19.85119434049726]
We propose UniUSNet, a universal framework for ultrasound image classification and segmentation.
This model handles various ultrasound types, anatomical positions, and input formats, excelling in both segmentation and classification tasks.
We plan to expand our dataset and refine the prompting mechanism, with model weights and code available at.
arXiv Detail & Related papers (2024-06-03T09:49:54Z) - CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images.
The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism.
We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - Show from Tell: Audio-Visual Modelling in Clinical Settings [58.88175583465277]
We consider audio-visual modelling in a clinical setting, providing a solution to learn medical representations without human expert annotation.
A simple yet effective multi-modal self-supervised learning framework is proposed for this purpose.
The proposed approach is able to localise anatomical regions of interest during ultrasound imaging, with only speech audio as a reference.
arXiv Detail & Related papers (2023-10-25T08:55:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.