Epistemic-aware Vision-Language Foundation Model for Fetal Ultrasound Interpretation
- URL: http://arxiv.org/abs/2510.12953v2
- Date: Thu, 23 Oct 2025 03:45:15 GMT
- Title: Epistemic-aware Vision-Language Foundation Model for Fetal Ultrasound Interpretation
- Authors: Xiao He, Huangxuan Zhao, Guojia Wan, Wei Zhou, Yanxing Liu, Juhua Liu, Yongchao Xu, Yong Luo, Dacheng Tao, Bo Du,
- Abstract summary: We introduce FetalMind, a medical AI system tailored to fetal ultrasound for both report generation and diagnosis.<n>We propose Salient Epistemic Disentanglement (SED), which injects an expert-curated bipartite graph into the model to decouple view-disease associations.<n>FetalMind outperforms open- and closed-source baselines across all gestational stages, achieving +14% average gains and +61.2% higher accuracy on critical conditions.
- Score: 83.02147613524032
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent medical vision-language models have shown promise on tasks such as VQA, report generation, and anomaly detection. However, most are adapted to structured adult imaging and underperform in fetal ultrasound, which poses challenges of multi-view image reasoning, numerous diseases, and image diversity. To bridge this gap, we introduce FetalMind, a medical AI system tailored to fetal ultrasound for both report generation and diagnosis. Guided by clinical workflow, we propose Salient Epistemic Disentanglement (SED), which injects an expert-curated bipartite graph into the model to decouple view-disease associations and to steer preference selection along clinically faithful steps via reinforcement learning. This design mitigates variability across diseases and heterogeneity across views, reducing learning bottlenecks while aligning the model's inference with obstetric practice. To train FetalMind at scale, we curate FetalSigma-1M dataset, the first large-scale fetal ultrasound report corpus, comprising 20K reports from twelve medical centers, addressing the scarcity of domain data. Extensive experiments show that FetalMind outperforms open- and closed-source baselines across all gestational stages, achieving +14% average gains and +61.2% higher accuracy on critical conditions while remaining efficient, stable, and scalable. Project Page: https://hexiao0275.github.io/FetalMind.
Related papers
- Beyond Benchmarks of IUGC: Rethinking Requirements of Deep Learning Methods for Intrapartum Ultrasound Biometry from Fetal Ultrasound Videos [58.71502465551297]
Intrapartum Ultrasound Grand Challenge (IUGC) co-hosted with MICCAI 2024 was launched.<n>IUGC introduces a clinically oriented multi-task automatic measurement framework that integrates standard plane classification, fetal head-pubic symphysis segmentation, and biometry.<n>The challenge releases the largest multi-center intrapartum ultrasound video dataset to date, comprising 774 videos (68,106 frames) collected from three hospitals.
arXiv Detail & Related papers (2026-02-13T13:28:22Z) - FETAL-GAUGE: A Benchmark for Assessing Vision-Language Models in Fetal Ultrasound [2.8097961263689406]
The demand for prenatal ultrasound imaging has intensified a global shortage of trained sonographers.<n>Deep learning has the potential to enhance sonographers' efficiency and support the training of new practitioners.<n>We present Fetal-Gauge, the first and largest visual question answering benchmark specifically designed to evaluate Vision-Language Models (VLMs)<n>Our benchmark comprises over 42,000 images and 93,000 question-answer pairs, spanning anatomical plane identification, visual grounding of anatomical structures, fetal orientation assessment, clinical view conformity, and clinical diagnosis.
arXiv Detail & Related papers (2025-12-25T04:54:37Z) - Challenging DINOv3 Foundation Model under Low Inter-Class Variability: A Case Study on Fetal Brain Ultrasound [4.07447364754644]
This study provides the first comprehensive evaluation of foundation models in fetal ultrasound (US) imaging under low interclass variability conditions.<n>We focus on fetal brain standard planes--transthalamic (TT), transventricular (TV), and transcerebellar (TC)--which exhibit highly overlapping anatomical features.<n>Models pretrained on fetal ultrasound data consistently outperformed those on natural images, with weighted F1-score improvements of up to 20 percent.
arXiv Detail & Related papers (2025-11-01T13:37:22Z) - CARDIUM: Congenital Anomaly Recognition with Diagnostic Images and Unified Medical records [30.96806728675348]
Prenatal diagnosis of Congenital Heart Diseases (CHDs) holds great potential for Artificial Intelligence (AI)-driven solutions.<n>Cardium dataset is the first publicly available multimodal dataset consolidating fetal ultrasound and echocardiographic images along with maternal clinical records for prenatal CHD detection.
arXiv Detail & Related papers (2025-10-17T00:30:19Z) - A Fully Open and Generalizable Foundation Model for Ultrasound Clinical Applications [77.3888788549565]
We present EchoCare, a novel ultrasound foundation model for generalist clinical use.<n>We developed EchoCare via self-supervised learning on our curated, publicly available, large-scale dataset EchoCareData.<n>With minimal training, EchoCare outperforms state-of-the-art comparison models across 10 representative ultrasound benchmarks.
arXiv Detail & Related papers (2025-09-15T10:05:31Z) - Advancing Fetal Ultrasound Image Quality Assessment in Low-Resource Settings [3.982826074217475]
We leverage FetalCLIP, a vision-caption model pretrained on a curated dataset of over 210,000 fetal ultrasound image-language pairs.<n>We introduce an IQA model adapted from FetalCLIP using Low-Rank Adaptation (LoRA), and evaluate it on the ACOUS-AI dataset.<n>We show that an adapted segmentation model, when repurposed for classification, further improves performance, achieving an F1 score of 0.771.
arXiv Detail & Related papers (2025-07-30T16:09:29Z) - FetalFlex: Anatomy-Guided Diffusion Model for Flexible Control on Fetal Ultrasound Image Synthesis [16.640351512021017]
We introduce a Flexible Fetal US image generation framework (FetalFlex) to address these challenges.<n>FetalFlex incorporates a pre-alignment module to enhance controllability and introduces a repaint strategy to ensure consistent texture and appearance.<n> Experiments on multi-center datasets demonstrate that FetalFlex achieved state-of-the-art performance across multiple image quality metrics.
arXiv Detail & Related papers (2025-03-19T05:16:19Z) - FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis [0.676810348604193]
FetalCLIP is a vision-language foundation model capable of generating universal representation of fetal ultrasound images.<n>It was pre-trained using a multimodal learning approach on a diverse dataset of 210,035 fetal ultrasound images paired with text.
arXiv Detail & Related papers (2025-02-20T18:30:34Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Factored Attention and Embedding for Unstructured-view Topic-related
Ultrasound Report Generation [70.7778938191405]
We propose a novel factored attention and embedding model (termed FAE-Gen) for the unstructured-view topic-related ultrasound report generation.
The proposed FAE-Gen mainly consists of two modules, i.e., view-guided factored attention and topic-oriented factored embedding, which capture the homogeneous and heterogeneous morphological characteristic across different views.
arXiv Detail & Related papers (2022-03-12T15:24:03Z) - FetReg: Placental Vessel Segmentation and Registration in Fetoscopy
Challenge Dataset [57.30136148318641]
Fetoscopy laser photocoagulation is a widely used procedure for the treatment of Twin-to-Twin Transfusion Syndrome (TTTS)
This may lead to increased procedural time and incomplete ablation, resulting in persistent TTTS.
Computer-assisted intervention may help overcome these challenges by expanding the fetoscopic field of view through video mosaicking and providing better visualization of the vessel network.
We present a large-scale multi-centre dataset for the development of generalized and robust semantic segmentation and video mosaicking algorithms for the fetal environment with a focus on creating drift-free mosaics from long duration fetoscopy videos.
arXiv Detail & Related papers (2021-06-10T17:14:27Z) - Hybrid Attention for Automatic Segmentation of Whole Fetal Head in
Prenatal Ultrasound Volumes [52.53375964591765]
We propose the first fully-automated solution to segment the whole fetal head in US volumes.
The segmentation task is firstly formulated as an end-to-end volumetric mapping under an encoder-decoder deep architecture.
We then combine the segmentor with a proposed hybrid attention scheme (HAS) to select discriminative features and suppress the non-informative volumetric features.
arXiv Detail & Related papers (2020-04-28T14:43:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.