EchoFlow: A Foundation Model for Cardiac Ultrasound Image and Video Generation
- URL: http://arxiv.org/abs/2503.22357v1
- Date: Fri, 28 Mar 2025 11:51:59 GMT
- Title: EchoFlow: A Foundation Model for Cardiac Ultrasound Image and Video Generation
- Authors: Hadrien Reynaud, Alberto Gomez, Paul Leeson, Qingjie Meng, Bernhard Kainz,
- Abstract summary: We present EchoFlow, a novel framework designed to generate high-quality, privacy-preserving synthetic echocardiogram images and videos.<n>We rigorously evaluate our synthetic datasets on the clinically relevant task of ejection fraction regression and demonstrate, for the first time, that downstream models trained exclusively on EchoFlow-generated synthetic datasets achieve performance parity with models trained on real datasets.
- Score: 6.849095682774907
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Advances in deep learning have significantly enhanced medical image analysis, yet the availability of large-scale medical datasets remains constrained by patient privacy concerns. We present EchoFlow, a novel framework designed to generate high-quality, privacy-preserving synthetic echocardiogram images and videos. EchoFlow comprises four key components: an adversarial variational autoencoder for defining an efficient latent representation of cardiac ultrasound images, a latent image flow matching model for generating accurate latent echocardiogram images, a latent re-identification model to ensure privacy by filtering images anatomically, and a latent video flow matching model for animating latent images into realistic echocardiogram videos conditioned on ejection fraction. We rigorously evaluate our synthetic datasets on the clinically relevant task of ejection fraction regression and demonstrate, for the first time, that downstream models trained exclusively on EchoFlow-generated synthetic datasets achieve performance parity with models trained on real datasets. We release our models and synthetic datasets, enabling broader, privacy-compliant research in medical ultrasound imaging at https://huggingface.co/spaces/HReynaud/EchoFlow.
Related papers
- EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance [79.66329903007869]
We present EchoWorld, a motion-aware world modeling framework for probe guidance.
It encodes anatomical knowledge and motion-induced visual dynamics.
It is trained on more than one million ultrasound images from over 200 routine scans.
arXiv Detail & Related papers (2025-04-17T16:19:05Z) - Ultrasound Image-to-Video Synthesis via Latent Dynamic Diffusion Models [17.949823366019285]
We propose synthesizing plausible ultrasound videos from readily available, abundant ultrasound images.<n>We demonstrate strong quantitative results and visually appealing synthesized videos on the BUSV benchmark.<n>Our image-to-video approach provides an effective data augmentation solution to advance ultrasound video analysis.
arXiv Detail & Related papers (2025-03-19T07:58:43Z) - DiffDoctor: Diagnosing Image Diffusion Models Before Treating [57.82359018425674]
We propose DiffDoctor, a two-stage pipeline to assist image diffusion models in generating fewer artifacts.<n>We collect a dataset of over 1M flawed synthesized images and set up an efficient human-in-the-loop annotation process.<n>The learned artifact detector is then involved in the second stage to optimize the diffusion model by providing pixel-level feedback.
arXiv Detail & Related papers (2025-01-21T18:56:41Z) - EchoNet-Synthetic: Privacy-preserving Video Generation for Safe Medical Data Sharing [5.900946696794718]
We present a model designed to produce high-fidelity, long and accessible complete data samples with near-real-time efficiency.
We develop our generation method based on diffusion models and introduce a protocol for medical video dataset anonymization.
We present EchoNet-Synthetic, a fully synthetic, privacy-compliant echocardiogram dataset with paired ejection fraction labels.
arXiv Detail & Related papers (2024-06-02T17:18:06Z) - Vision-Language Synthetic Data Enhances Echocardiography Downstream Tasks [4.1942958779358674]
This paper utilizes recent vision-language models to produce diverse and realistic synthetic echocardiography image data.
We show that the rich contextual information present in the synthesized data potentially enhances the accuracy and interpretability of downstream tasks.
arXiv Detail & Related papers (2024-03-28T23:26:45Z) - CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images.
The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism.
We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z) - UAV-Sim: NeRF-based Synthetic Data Generation for UAV-based Perception [62.71374902455154]
We leverage recent advancements in neural rendering to improve static and dynamic novelview UAV-based image rendering.
We demonstrate a considerable performance boost when a state-of-the-art detection model is optimized primarily on hybrid sets of real and synthetic data.
arXiv Detail & Related papers (2023-10-25T00:20:37Z) - LOTUS: Learning to Optimize Task-based US representations [39.81131738128329]
Anatomical segmentation of organs in ultrasound images is essential to many clinical applications.
Existing deep neural networks require a large amount of labeled data for training in order to achieve clinically acceptable performance.
In this paper, we propose a novel approach for learning to optimize task-based ultra-sound image representations.
arXiv Detail & Related papers (2023-07-29T16:29:39Z) - Feature-Conditioned Cascaded Video Diffusion Models for Precise
Echocardiogram Synthesis [5.102090025931326]
We extend elucidated diffusion models for video modelling to generate plausible video sequences from single images.
Our image to sequence approach achieves an $R2$ score of 93%, 38 points higher than recently proposed sequence to sequence generation methods.
arXiv Detail & Related papers (2023-03-22T15:26:22Z) - Differentiable Frequency-based Disentanglement for Aerial Video Action
Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos.
Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras.
We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z) - DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder [73.1010640692609]
We propose a VQ-VAE architecture model with a diffusion decoder (DiVAE) to work as the reconstructing component in image synthesis.
Our model achieves state-of-the-art results and generates more photorealistic images specifically.
arXiv Detail & Related papers (2022-06-01T10:39:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.