Challenging DINOv3 Foundation Model under Low Inter-Class Variability: A Case Study on Fetal Brain Ultrasound
- URL: http://arxiv.org/abs/2511.01915v1
- Date: Sat, 01 Nov 2025 13:37:22 GMT
- Title: Challenging DINOv3 Foundation Model under Low Inter-Class Variability: A Case Study on Fetal Brain Ultrasound
- Authors: Edoardo Conti, Riccardo Rosati, Lorenzo Federici, Adriano Mancini, Maria Chiara Fiorentin,
- Abstract summary: This study provides the first comprehensive evaluation of foundation models in fetal ultrasound (US) imaging under low interclass variability conditions.<n>We focus on fetal brain standard planes--transthalamic (TT), transventricular (TV), and transcerebellar (TC)--which exhibit highly overlapping anatomical features.<n>Models pretrained on fetal ultrasound data consistently outperformed those on natural images, with weighted F1-score improvements of up to 20 percent.
- Score: 4.07447364754644
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Purpose: This study provides the first comprehensive evaluation of foundation models in fetal ultrasound (US) imaging under low inter-class variability conditions. While recent vision foundation models such as DINOv3 have shown remarkable transferability across medical domains, their ability to discriminate anatomically similar structures has not been systematically investigated. We address this gap by focusing on fetal brain standard planes--transthalamic (TT), transventricular (TV), and transcerebellar (TC)--which exhibit highly overlapping anatomical features and pose a critical challenge for reliable biometric assessment. Methods: To ensure a fair and reproducible evaluation, all publicly available fetal ultrasound datasets were curated and aggregated into a unified multicenter benchmark, FetalUS-188K, comprising more than 188,000 annotated images from heterogeneous acquisition settings. DINOv3 was pretrained in a self-supervised manner to learn ultrasound-aware representations. The learned features were then evaluated through standardized adaptation protocols, including linear probing with frozen backbone and full fine-tuning, under two initialization schemes: (i) pretraining on FetalUS-188K and (ii) initialization from natural-image DINOv3 weights. Results: Models pretrained on fetal ultrasound data consistently outperformed those initialized on natural images, with weighted F1-score improvements of up to 20 percent. Domain-adaptive pretraining enabled the network to preserve subtle echogenic and structural cues crucial for distinguishing intermediate planes such as TV. Conclusion: Results demonstrate that generic foundation models fail to generalize under low inter-class variability, whereas domain-specific pretraining is essential to achieve robust and clinically reliable representations in fetal brain ultrasound imaging.
Related papers
- Automated Classification of First-Trimester Fetal Heart Views Using Ultrasound-Specific Self-Supervised Learning [0.205246094017924]
We evaluate a self-supervised ultrasound foundation model, USF-MAE, for first-trimester fetal heart view classification.<n> USF-MAE is pretrained using masked autoencoding modelling on more than 370,000 unlabelled ultrasound images.<n>It achieved the highest performance across all evaluation metrics, with 90.57% accuracy, 91.15% precision, 90.57% recall, and 90.71% F1-score.
arXiv Detail & Related papers (2025-12-30T22:24:26Z) - Self-Supervised Ultrasound Representation Learning for Renal Anomaly Prediction in Prenatal Imaging [0.19544534628180868]
We assessed the performance of a self-supervised ultrasound foundation model for automated fetal renal anomaly classification.<n>Models were compared with a DenseNet-169 convolutional baseline using cross-validation and an independent test set.<n>The largest gains were observed in the multi-class setting, where the improvement in AUC was 16.28% and 46.15% in F1-score.
arXiv Detail & Related papers (2025-12-15T15:28:02Z) - Epistemic-aware Vision-Language Foundation Model for Fetal Ultrasound Interpretation [83.02147613524032]
We introduce FetalMind, a medical AI system tailored to fetal ultrasound for both report generation and diagnosis.<n>We propose Salient Epistemic Disentanglement (SED), which injects an expert-curated bipartite graph into the model to decouple view-disease associations.<n>FetalMind outperforms open- and closed-source baselines across all gestational stages, achieving +14% average gains and +61.2% higher accuracy on critical conditions.
arXiv Detail & Related papers (2025-10-14T19:57:03Z) - A Fully Open and Generalizable Foundation Model for Ultrasound Clinical Applications [77.3888788549565]
We present EchoCare, a novel ultrasound foundation model for generalist clinical use.<n>We developed EchoCare via self-supervised learning on our curated, publicly available, large-scale dataset EchoCareData.<n>With minimal training, EchoCare outperforms state-of-the-art comparison models across 10 representative ultrasound benchmarks.
arXiv Detail & Related papers (2025-09-15T10:05:31Z) - FoundDiff: Foundational Diffusion Model for Generalizable Low-Dose CT Denoising [55.04342933312839]
We propose FoundDiff, a foundational diffusion model for unified and generalizable low-dose computed tomography (CT) denoising.<n>FoundDiff employs a two-stage strategy: (i) dose-anatomy perception and (ii) adaptive denoising.<n>First, we develop a dose- and anatomy-aware contrastive language image pre-training model (DA-CLIP) to achieve robust dose and anatomy perception.<n>Second, we design a dose- and anatomy-aware diffusion model (DA-Diff) to perform adaptive and generalizable denoising.
arXiv Detail & Related papers (2025-08-24T11:03:56Z) - The Efficacy of Semantics-Preserving Transformations in Self-Supervised Learning for Medical Ultrasound [60.80780313225093]
This study systematically investigated the impact of data augmentation and preprocessing strategies in self-supervised learning for lung ultrasound.<n>Three data augmentation pipelines were assessed: a baseline pipeline commonly used across imaging domains, a novel semantic-preserving pipeline designed for ultrasound, and a distilled set of the most effective transformations from both pipelines.
arXiv Detail & Related papers (2025-04-10T16:26:47Z) - Reliable Multi-View Learning with Conformal Prediction for Aortic Stenosis Classification in Echocardiography [6.540741143328299]
The acquired images are often 2-D cross-sections of a 3-D anatomy, potentially missing important anatomical details.
We propose Re-Training for Uncertainty (RT4U), a data-centric method to introduce uncertainty to weakly informative inputs in the training set.
When combined with conformal prediction techniques, RT4U can yield adaptively sized prediction sets which are guaranteed to contain the ground truth class to a high accuracy.
arXiv Detail & Related papers (2024-09-15T10:06:06Z) - Fusion of Diffusion Weighted MRI and Clinical Data for Predicting
Functional Outcome after Acute Ischemic Stroke with Deep Contrastive Learning [1.4149937986822438]
Stroke is a common disabling neurological condition that affects about one-quarter of the adult population over age 25.
Our proposed fusion model achieves 0.87, 0.80 and 80.45% for AUC, F1-score and accuracy, respectively.
arXiv Detail & Related papers (2024-02-16T18:51:42Z) - Evaluating General Purpose Vision Foundation Models for Medical Image Analysis: An Experimental Study of DINOv2 on Radiology Benchmarks [5.8941124219471055]
DINOv2 is an open-source foundation model pre-trained with self-supervised learning on 142 million curated natural images.
This study comprehensively evaluates the performance DINOv2 for radiology.
arXiv Detail & Related papers (2023-12-04T21:47:10Z) - Towards A Device-Independent Deep Learning Approach for the Automated
Segmentation of Sonographic Fetal Brain Structures: A Multi-Center and
Multi-Device Validation [0.0]
We propose a DL based segmentation framework for the automated segmentation of 10 key fetal brain structures from 2 axial planes from fetal brain USG images (2D)
The proposed DL system offered a promising and generalizable performance (multi-centers, multi-device) and also presents evidence in support of device-induced variation in image quality.
arXiv Detail & Related papers (2022-02-28T05:42:03Z) - Cross-Site Severity Assessment of COVID-19 from CT Images via Domain
Adaptation [64.59521853145368]
Early and accurate severity assessment of Coronavirus disease 2019 (COVID-19) based on computed tomography (CT) images offers a great help to the estimation of intensive care unit event.
To augment the labeled data and improve the generalization ability of the classification model, it is necessary to aggregate data from multiple sites.
This task faces several challenges including class imbalance between mild and severe infections, domain distribution discrepancy between sites, and presence of heterogeneous features.
arXiv Detail & Related papers (2021-09-08T07:56:51Z) - Hybrid Attention for Automatic Segmentation of Whole Fetal Head in
Prenatal Ultrasound Volumes [52.53375964591765]
We propose the first fully-automated solution to segment the whole fetal head in US volumes.
The segmentation task is firstly formulated as an end-to-end volumetric mapping under an encoder-decoder deep architecture.
We then combine the segmentor with a proposed hybrid attention scheme (HAS) to select discriminative features and suppress the non-informative volumetric features.
arXiv Detail & Related papers (2020-04-28T14:43:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.