USF-MAE: Ultrasound Self-Supervised Foundation Model with Masked Autoencoding
- URL: http://arxiv.org/abs/2510.22990v2
- Date: Fri, 07 Nov 2025 04:12:21 GMT
- Title: USF-MAE: Ultrasound Self-Supervised Foundation Model with Masked Autoencoding
- Authors: Youssef Megahed, Robin Ducharme, Aylin Erman, Mark Walker, Steven Hawken, Adrian D. C. Chan,
- Abstract summary: We introduce the Ultrasound Self-Supervised Foundation Model with Masked Autoencoding (USF-MAE)<n>USF-MAE is the first large-scale self-supervised MAE framework pretrained exclusively on ultrasound data.<n>The model was pre-trained on 370,000 2D and 3D ultrasound images curated from 46 open-source datasets.
- Score: 0.205246094017924
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ultrasound imaging is one of the most widely used diagnostic modalities, offering real-time, radiation-free assessment across diverse clinical domains. However, interpretation of ultrasound images remains challenging due to high noise levels, operator dependence, and limited field of view, resulting in substantial inter-observer variability. Current Deep Learning approaches are hindered by the scarcity of large labeled datasets and the domain gap between general and sonographic images, which limits the transferability of models pretrained on non-medical data. To address these challenges, we introduce the Ultrasound Self-Supervised Foundation Model with Masked Autoencoding (USF-MAE), the first large-scale self-supervised MAE framework pretrained exclusively on ultrasound data. The model was pre-trained on 370,000 2D and 3D ultrasound images curated from 46 open-source datasets, collectively termed OpenUS-46, spanning over twenty anatomical regions. This curated dataset has been made publicly available to facilitate further research and reproducibility. Using a Vision Transformer encoder-decoder architecture, USF-MAE reconstructs masked image patches, enabling it to learn rich, modality-specific representations directly from unlabeled data. The pretrained encoder was fine-tuned on three public downstream classification benchmarks: BUS-BRA (breast cancer), MMOTU-2D (ovarian tumors), and GIST514-DB (gastrointestinal stromal tumors). Across all tasks, USF-MAE consistently outperformed conventional CNN and ViT baselines, achieving F1-scores of 81.6%, 79.6%, and 82.4%, respectively. Despite not using labels during pretraining, USF-MAE approached the performance of the supervised foundation model UltraSam on breast cancer classification and surpassed it on the other tasks, demonstrating strong cross-anatomical generalization.
Related papers
- A texture-based framework for foundational ultrasound models [0.0]
We reformulate self-supervised learning as a texture-analysis problem, introducing texture ultrasound semantic analysis (TUSA)<n>We train a TUSA model on a combination of open-source, simulated, and in vivo data.<n>Our model achieves higher accuracy in detecting COVID (70%), spinal hematoma (100%) and vitreous hemorrhage (97%) and correlates more closely with quantitative parameters like liver steatosis (r = 0.83), ejection fraction (r = 0.63), and oxygen saturation (r = 0.38)
arXiv Detail & Related papers (2026-02-01T21:26:31Z) - Automated Classification of First-Trimester Fetal Heart Views Using Ultrasound-Specific Self-Supervised Learning [0.205246094017924]
We evaluate a self-supervised ultrasound foundation model, USF-MAE, for first-trimester fetal heart view classification.<n> USF-MAE is pretrained using masked autoencoding modelling on more than 370,000 unlabelled ultrasound images.<n>It achieved the highest performance across all evaluation metrics, with 90.57% accuracy, 91.15% precision, 90.57% recall, and 90.71% F1-score.
arXiv Detail & Related papers (2025-12-30T22:24:26Z) - Challenging DINOv3 Foundation Model under Low Inter-Class Variability: A Case Study on Fetal Brain Ultrasound [4.07447364754644]
This study provides the first comprehensive evaluation of foundation models in fetal ultrasound (US) imaging under low interclass variability conditions.<n>We focus on fetal brain standard planes--transthalamic (TT), transventricular (TV), and transcerebellar (TC)--which exhibit highly overlapping anatomical features.<n>Models pretrained on fetal ultrasound data consistently outperformed those on natural images, with weighted F1-score improvements of up to 20 percent.
arXiv Detail & Related papers (2025-11-01T13:37:22Z) - A Fully Open and Generalizable Foundation Model for Ultrasound Clinical Applications [77.3888788549565]
We present EchoCare, a novel ultrasound foundation model for generalist clinical use.<n>We developed EchoCare via self-supervised learning on our curated, publicly available, large-scale dataset EchoCareData.<n>With minimal training, EchoCare outperforms state-of-the-art comparison models across 10 representative ultrasound benchmarks.
arXiv Detail & Related papers (2025-09-15T10:05:31Z) - Grounding DINO-US-SAM: Text-Prompted Multi-Organ Segmentation in Ultrasound with LoRA-Tuned Vision-Language Models [4.981692533734272]
We propose a prompt-driven vision-language model (VLM) that integrates Grounding DINO with SAM2 to enable object segmentation across multiple ultrasound organs.<n>A total of 18 public ultrasound datasets, encompassing the breast, thyroid, liver, prostate, kidney, and paraspinal muscle, were utilized.<n>Our approach outperforms state-of-the-art segmentation methods, including UniverSeg, MedSAM, MedCLIP-SAM, BiomedParse, and SAMUS.
arXiv Detail & Related papers (2025-06-30T14:33:44Z) - The Efficacy of Semantics-Preserving Transformations in Self-Supervised Learning for Medical Ultrasound [60.80780313225093]
This study systematically investigated the impact of data augmentation and preprocessing strategies in self-supervised learning for lung ultrasound.<n>Three data augmentation pipelines were assessed: a baseline pipeline commonly used across imaging domains, a novel semantic-preserving pipeline designed for ultrasound, and a distilled set of the most effective transformations from both pipelines.
arXiv Detail & Related papers (2025-04-10T16:26:47Z) - Privacy-Preserving Federated Foundation Model for Generalist Ultrasound Artificial Intelligence [83.02106623401885]
We present UltraFedFM, an innovative privacy-preserving ultrasound foundation model.
UltraFedFM is collaboratively pre-trained using federated learning across 16 distributed medical institutions in 9 countries.
It achieves an average area under the receiver operating characteristic curve of 0.927 for disease diagnosis and a dice similarity coefficient of 0.878 for lesion segmentation.
arXiv Detail & Related papers (2024-11-25T13:40:11Z) - CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images.
The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism.
We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z) - WATUNet: A Deep Neural Network for Segmentation of Volumetric Sweep
Imaging Ultrasound [1.2903292694072621]
Volume sweep imaging (VSI) is an innovative approach that enables untrained operators to capture quality ultrasound images.
We present a novel segmentation model known as Wavelet_Attention_UNet (WATUNet)
In this model, we incorporate wavelet gates (WGs) and attention gates (AGs) between the encoder and decoder instead of a simple connection to overcome the limitations mentioned.
arXiv Detail & Related papers (2023-11-17T20:32:37Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Towards Realistic Ultrasound Fetal Brain Imaging Synthesis [0.7315240103690552]
There are few public ultrasound fetal imaging datasets due to insufficient amounts of clinical data, patient privacy, rare occurrence of abnormalities in general practice, and limited experts for data collection and validation.
To address such data scarcity, we proposed generative adversarial networks (GAN)-based models, diffusion-super-resolution-GAN and transformer-based-GAN, to synthesise images of fetal ultrasound brain planes from one public dataset.
arXiv Detail & Related papers (2023-04-08T07:07:20Z) - Multi-Modal Active Learning for Automatic Liver Fibrosis Diagnosis based
on Ultrasound Shear Wave Elastography [13.13249599000645]
Noninvasive diagnosis like ultrasound (US) imaging plays a very important role in automatic liver fibrosis diagnosis (ALFD)
Due to the noisy data, expensive annotations of US images, the application of Artificial Intelligence (AI) assisting approaches encounters a bottleneck.
In this work, we innovatively propose a multi-modal fusion network with active learning (MMFN-AL) for ALFD to exploit the information of multiple modalities.
arXiv Detail & Related papers (2020-11-02T03:05:24Z) - Fader Networks for domain adaptation on fMRI: ABIDE-II study [68.5481471934606]
We use 3D convolutional autoencoders to build the domain irrelevant latent space image representation and demonstrate this method to outperform existing approaches on ABIDE data.
arXiv Detail & Related papers (2020-10-14T16:50:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.