Generative deep learning for foundational video translation in ultrasound
- URL: http://arxiv.org/abs/2511.03255v1
- Date: Wed, 05 Nov 2025 07:32:43 GMT
- Title: Generative deep learning for foundational video translation in ultrasound
- Authors: Nikolina Tomic Roshni Bhatnagar, Sarthak Jain, Connor Lau, Tien-Yu Liu, Laura Gambini, Rima Arnaout,
- Abstract summary: We present a generative method for ultrasound CFD-greyscale video translation, trained on 54,975 videos and tested on 8,368.<n>The method developed leveraged pixel-wise, adversarial, and perceptual loses and utilized two networks: one for reconstructing anatomic structures and one for denoising to achieve realistic ultrasound imaging.
- Score: 5.598184950574122
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Deep learning (DL) has the potential to revolutionize image acquisition and interpretation across medicine, however, attention to data imbalance and missingness is required. Ultrasound data presents a particular challenge because in addition to different views and structures, it includes several sub-modalities-such as greyscale and color flow doppler (CFD)-that are often imbalanced in clinical studies. Image translation can help balance datasets but is challenging for ultrasound sub-modalities to date. Here, we present a generative method for ultrasound CFD-greyscale video translation, trained on 54,975 videos and tested on 8,368. The method developed leveraged pixel-wise, adversarial, and perceptual loses and utilized two networks: one for reconstructing anatomic structures and one for denoising to achieve realistic ultrasound imaging. Average pairwise SSIM between synthetic videos and ground truth was 0.91+/-0.04. Synthetic videos performed indistinguishably from real ones in DL classification and segmentation tasks and when evaluated by blinded clinical experts: F1 score was 0.9 for real and 0.89 for synthetic videos; Dice score between real and synthetic segmentation was 0.97. Overall clinician accuracy in distinguishing real vs synthetic videos was 54+/-6% (42-61%), indicating realistic synthetic videos. Although trained only on heart videos, the model worked well on ultrasound spanning several clinical domains (average SSIM 0.91+/-0.05), demonstrating foundational abilities. Together, these data expand the utility of retrospectively collected imaging and augment the dataset design toolbox for medical imaging.
Related papers
- Ultrasound Image-to-Video Synthesis via Latent Dynamic Diffusion Models [17.949823366019285]
We propose synthesizing plausible ultrasound videos from readily available, abundant ultrasound images.<n>We demonstrate strong quantitative results and visually appealing synthesized videos on the BUSV benchmark.<n>Our image-to-video approach provides an effective data augmentation solution to advance ultrasound video analysis.
arXiv Detail & Related papers (2025-03-19T07:58:43Z) - Merging synthetic and real embryo data for advanced AI predictions [69.07284335967019]
We train two generative models using two datasets-one we created and made publicly available, and one existing public dataset-to generate synthetic embryo images at various cell stages.<n>These were combined with real images to train classification models for embryo cell stage prediction.<n>Our results demonstrate that incorporating synthetic images alongside real data improved classification performance, with the model achieving 97% accuracy compared to 94.5% when trained solely on real data.
arXiv Detail & Related papers (2024-12-02T08:24:49Z) - Breast tumor classification based on self-supervised contrastive learning from ultrasound videos [7.825379326219145]
We adopted a triplet network and a self-supervised contrastive learning technique to learn representations from unlabeled breast ultrasound video clips.
Our model achieved an area under the receiver operating characteristic curve (AUC) of 0.952, which is significantly higher than the others.
The proposed framework greatly reduces the demand for labeled data and holds potential for use in automatic breast ultrasound image diagnosis.
arXiv Detail & Related papers (2024-08-20T07:16:01Z) - CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images.
The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism.
We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z) - Echo from noise: synthetic ultrasound image generation using diffusion
models for real image segmentation [0.3999851878220878]
We show that synthetic images can serve as a viable substitute for real data in the training of deep-learning models for ultrasound image analysis tasks.
We generated synthetic 2D echocardiograms and trained a neural network for segmenting the left ventricle and left atrium.
arXiv Detail & Related papers (2023-05-09T13:15:52Z) - Feature-Conditioned Cascaded Video Diffusion Models for Precise
Echocardiogram Synthesis [5.102090025931326]
We extend elucidated diffusion models for video modelling to generate plausible video sequences from single images.
Our image to sequence approach achieves an $R2$ score of 93%, 38 points higher than recently proposed sequence to sequence generation methods.
arXiv Detail & Related papers (2023-03-22T15:26:22Z) - Self-supervised contrastive learning of echocardiogram videos enables
label-efficient cardiac disease diagnosis [48.64462717254158]
We developed a self-supervised contrastive learning approach, EchoCLR, to catered to echocardiogram videos.
When fine-tuned on small portions of labeled data, EchoCLR pretraining significantly improved classification performance for left ventricular hypertrophy (LVH) and aortic stenosis (AS)
EchoCLR is unique in its ability to learn representations of medical videos and demonstrates that SSL can enable label-efficient disease classification from small, labeled datasets.
arXiv Detail & Related papers (2022-07-23T19:17:26Z) - Weakly-supervised High-fidelity Ultrasound Video Synthesis with Feature
Decoupling [13.161739586288704]
In clinical practice, analysis and diagnosis often rely on US sequences rather than a single image to obtain dynamic anatomical information.
This is challenging for novices to learn because practicing with adequate videos from patients is clinically unpractical.
We propose a novel framework to synthesize high-fidelity US videos.
arXiv Detail & Related papers (2022-07-01T14:53:22Z) - Harmonizing Pathological and Normal Pixels for Pseudo-healthy Synthesis [68.5287824124996]
We present a new type of discriminator, the segmentor, to accurately locate the lesions and improve the visual quality of pseudo-healthy images.
We apply the generated images into medical image enhancement and utilize the enhanced results to cope with the low contrast problem.
Comprehensive experiments on the T2 modality of BraTS demonstrate that the proposed method substantially outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2022-03-29T08:41:17Z) - Voice-assisted Image Labelling for Endoscopic Ultrasound Classification
using Neural Networks [48.732863591145964]
We propose a multi-modal convolutional neural network architecture that labels endoscopic ultrasound (EUS) images from raw verbal comments provided by a clinician during the procedure.
Our results show a prediction accuracy of 76% at image level on a dataset with 5 different labels.
arXiv Detail & Related papers (2021-10-12T21:22:24Z) - Vision Transformers for femur fracture classification [59.99241204074268]
The Vision Transformer (ViT) was able to correctly predict 83% of the test images.
Good results were obtained in sub-fractures with the largest and richest dataset ever.
arXiv Detail & Related papers (2021-08-07T10:12:42Z) - Image Translation for Medical Image Generation -- Ischemic Stroke
Lesions [0.0]
Synthetic databases with annotated pathologies could provide the required amounts of training data.
We train different image-to-image translation models to synthesize magnetic resonance images of brain volumes with and without stroke lesions.
We show that for a small database of only 10 or 50 clinical cases, synthetic data augmentation yields significant improvement.
arXiv Detail & Related papers (2020-10-05T09:12:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.