FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis
- URL: http://arxiv.org/abs/2502.14807v2
- Date: Mon, 07 Apr 2025 17:03:03 GMT
- Title: FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis
- Authors: Fadillah Maani, Numan Saeed, Tausifa Saleem, Zaid Farooq, Hussain Alasmawi, Werner Diehl, Ameera Mohammad, Gareth Waring, Saudabi Valappi, Leanne Bricker, Mohammad Yaqub,
- Abstract summary: FetalCLIP is a vision-language foundation model capable of generating universal representation of fetal ultrasound images.<n>It was pre-trained using a multimodal learning approach on a diverse dataset of 210,035 fetal ultrasound images paired with text.
- Score: 0.676810348604193
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Foundation models are becoming increasingly effective in the medical domain, offering pre-trained models on large datasets that can be readily adapted for downstream tasks. Despite progress, fetal ultrasound images remain a challenging domain for foundation models due to their inherent complexity, often requiring substantial additional training and facing limitations due to the scarcity of paired multimodal data. To overcome these challenges, here we introduce FetalCLIP, a vision-language foundation model capable of generating universal representation of fetal ultrasound images. FetalCLIP was pre-trained using a multimodal learning approach on a diverse dataset of 210,035 fetal ultrasound images paired with text. This represents the largest paired dataset of its kind used for foundation model development to date. This unique training approach allows FetalCLIP to effectively learn the intricate anatomical features present in fetal ultrasound images, resulting in robust representations that can be used for a variety of downstream applications. In extensive benchmarking across a range of key fetal ultrasound applications, including classification, gestational age estimation, congenital heart defect (CHD) detection, and fetal structure segmentation, FetalCLIP outperformed all baselines while demonstrating remarkable generalizability and strong performance even with limited labeled data. We plan to release the FetalCLIP model publicly for the benefit of the broader scientific community.
Related papers
- HDC: Hierarchical Distillation for Multi-level Noisy Consistency in Semi-Supervised Fetal Ultrasound Segmentation [2.964206587462833]
A novel semi-supervised segmentation framework, called HDC, is proposed incorporating adaptive consistency learning with a single-teacher architecture.
The framework introduces a hierarchical distillation mechanism with two objectives: Correlation Guidance Loss for aligning feature representations and Mutual Information Loss for stabilizing noisy student learning.
arXiv Detail & Related papers (2025-04-14T04:52:24Z) - FetalFlex: Anatomy-Guided Diffusion Model for Flexible Control on Fetal Ultrasound Image Synthesis [16.640351512021017]
We introduce a Flexible Fetal US image generation framework (FetalFlex) to address these challenges.
FetalFlex incorporates a pre-alignment module to enhance controllability and introduces a repaint strategy to ensure consistent texture and appearance.
Experiments on multi-center datasets demonstrate that FetalFlex achieved state-of-the-art performance across multiple image quality metrics.
arXiv Detail & Related papers (2025-03-19T05:16:19Z) - Molecular-driven Foundation Model for Oncologic Pathology [6.922502805825084]
We introduce Threads, a slide-level foundation model capable of generating universal representations of whole-slide images of any size.<n> Threads was pre-trained using a multimodal learning approach on a diverse cohort of 47,171 hematoxylin and eosin (H&E)-stained tissue sections.
arXiv Detail & Related papers (2025-01-28T02:35:02Z) - UniUSNet: A Promptable Framework for Universal Ultrasound Disease Prediction and Tissue Segmentation [19.85119434049726]
We propose UniUSNet, a universal framework for ultrasound image classification and segmentation.
This model handles various ultrasound types, anatomical positions, and input formats, excelling in both segmentation and classification tasks.
We plan to expand our dataset and refine the prompting mechanism, with model weights and code available at.
arXiv Detail & Related papers (2024-06-03T09:49:54Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach.
Our approach is easy to integrate into any hybrid model and requires no external training data.
Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z) - Towards Realistic Ultrasound Fetal Brain Imaging Synthesis [0.7315240103690552]
There are few public ultrasound fetal imaging datasets due to insufficient amounts of clinical data, patient privacy, rare occurrence of abnormalities in general practice, and limited experts for data collection and validation.
To address such data scarcity, we proposed generative adversarial networks (GAN)-based models, diffusion-super-resolution-GAN and transformer-based-GAN, to synthesise images of fetal ultrasound brain planes from one public dataset.
arXiv Detail & Related papers (2023-04-08T07:07:20Z) - FPUS23: An Ultrasound Fetus Phantom Dataset with Deep Neural Network
Evaluations for Fetus Orientations, Fetal Planes, and Anatomical Features [10.404128105946583]
We present a novel fetus phantom ultrasound dataset, FPUS23, which can be used to identify the correct diagnostic planes for estimating fetal biometric values.
The entire dataset is composed of 15,728 images, which are used to train four different Deep Neural Network models.
We have also evaluated the models trained using our FPUS23 dataset, to show that the information learned by these models can be used to substantially increase the accuracy on real-world ultrasound fetus datasets.
arXiv Detail & Related papers (2023-03-14T12:46:48Z) - Ultrasound Signal Processing: From Models to Deep Learning [64.56774869055826]
Medical ultrasound imaging relies heavily on high-quality signal processing to provide reliable and interpretable image reconstructions.
Deep learning based methods, which are optimized in a data-driven fashion, have gained popularity.
A relatively new paradigm combines the power of the two: leveraging data-driven deep learning, as well as exploiting domain knowledge.
arXiv Detail & Related papers (2022-04-09T13:04:36Z) - FetReg: Placental Vessel Segmentation and Registration in Fetoscopy
Challenge Dataset [57.30136148318641]
Fetoscopy laser photocoagulation is a widely used procedure for the treatment of Twin-to-Twin Transfusion Syndrome (TTTS)
This may lead to increased procedural time and incomplete ablation, resulting in persistent TTTS.
Computer-assisted intervention may help overcome these challenges by expanding the fetoscopic field of view through video mosaicking and providing better visualization of the vessel network.
We present a large-scale multi-centre dataset for the development of generalized and robust semantic segmentation and video mosaicking algorithms for the fetal environment with a focus on creating drift-free mosaics from long duration fetoscopy videos.
arXiv Detail & Related papers (2021-06-10T17:14:27Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - Deep Co-Attention Network for Multi-View Subspace Learning [73.3450258002607]
We propose a deep co-attention network for multi-view subspace learning.
It aims to extract both the common information and the complementary information in an adversarial setting.
In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation.
arXiv Detail & Related papers (2021-02-15T18:46:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.