Related papers: HeartBeat: Towards Controllable Echocardiography Video Synthesis with Multimodal Conditions-Guided Diffusion Models

HeartBeat: Towards Controllable Echocardiography Video Synthesis with Multimodal Conditions-Guided Diffusion Models

URL: http://arxiv.org/abs/2406.14098v2
Date: Fri, 5 Jul 2024 01:56:29 GMT
Title: HeartBeat: Towards Controllable Echocardiography Video Synthesis with Multimodal Conditions-Guided Diffusion Models
Authors: Xinrui Zhou, Yuhao Huang, Wufeng Xue, Haoran Dou, Jun Cheng, Han Zhou, Dong Ni,
Abstract summary: We propose a novel framework named HeartBeat towards controllable and high-fidelity ECHO video synthesis. HeartBeat serves as a unified framework that enables perceiving multimodal conditions simultaneously to guide controllable generation. In this way, users can synthesize ECHO videos that conform to their mental imagery by combining multimodal control signals.
Score: 14.280181445804226
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Echocardiography (ECHO) video is widely used for cardiac examination. In clinical, this procedure heavily relies on operator experience, which needs years of training and maybe the assistance of deep learning-based systems for enhanced accuracy and efficiency. However, it is challenging since acquiring sufficient customized data (e.g., abnormal cases) for novice training and deep model development is clinically unrealistic. Hence, controllable ECHO video synthesis is highly desirable. In this paper, we propose a novel diffusion-based framework named HeartBeat towards controllable and high-fidelity ECHO video synthesis. Our highlight is three-fold. First, HeartBeat serves as a unified framework that enables perceiving multimodal conditions simultaneously to guide controllable generation. Second, we factorize the multimodal conditions into local and global ones, with two insertion strategies separately provided fine- and coarse-grained controls in a composable and flexible manner. In this way, users can synthesize ECHO videos that conform to their mental imagery by combining multimodal control signals. Third, we propose to decouple the visual concepts and temporal dynamics learning using a two-stage training scheme for simplifying the model training. One more interesting thing is that HeartBeat can easily generalize to mask-guided cardiac MRI synthesis in a few shots, showcasing its scalability to broader applications. Extensive experiments on two public datasets show the efficacy of the proposed HeartBeat.

Related papers

EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance [79.66329903007869]
We present EchoWorld, a motion-aware world modeling framework for probe guidance. It encodes anatomical knowledge and motion-induced visual dynamics. It is trained on more than one million ultrasound images from over 200 routine scans.
arXiv Detail & Related papers (2025-04-17T16:19:05Z)
HeartBERT: A Self-Supervised ECG Embedding Model for Efficient and Effective Medical Signal Analysis [1.124958340749622]
HeartBert is inspired by Bidirectional Representations from Transformers (BERT) in natural language processing and enhanced with a self-supervised learning approach. To demonstrate the versatility, generalizability, and efficiency of the proposed model, two key downstream tasks have been selected: sleep stage detection and heartbeat classification. A series of practical experiments have been conducted to demonstrate the superiority and advancements of HeartBERT.
arXiv Detail & Related papers (2024-11-08T14:25:00Z)
ECHOPulse: ECG controlled echocardio-grams video generation [30.753399869167588]
Echocardiography (ECHO) is essential for cardiac assessments. ECHO video generation offers a solution by improving automated monitoring. ECHOPULSE is an ECG-conditioned ECHO video generation model.
arXiv Detail & Related papers (2024-10-04T04:49:56Z)
PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation [51.509573838103854]
We propose a semi-supervised learning framework, termed Progressive Mean Teachers (PMT), for medical image segmentation. Our PMT generates high-fidelity pseudo labels by learning robust and diverse features in the training process. Experimental results on two datasets with different modalities, i.e., CT and MRI, demonstrate that our method outperforms the state-of-the-art medical image segmentation approaches.
arXiv Detail & Related papers (2024-09-08T15:02:25Z)
Explainable and Controllable Motion Curve Guided Cardiac Ultrasound Video Generation [11.879436948659691]
We propose an explainable and controllable method for echocardiography video generation. First, we extract motion information from each heart substructure to construct motion curves. Second, we propose the structure-to-motion alignment module, which can map semantic features onto motion curves. Third, The position-aware attention mechanism is designed to enhance video consistency utilizing Gaussian masks with structural position information.
arXiv Detail & Related papers (2024-07-31T09:59:20Z)
NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation [55.51412454263856]
This paper proposes to directly modulate the generation process of diffusion models using fMRI signals. By training with about 67,000 fMRI-image pairs from various individuals, our model enjoys superior fMRI-to-image decoding capacity.
arXiv Detail & Related papers (2024-03-27T02:42:52Z)
Dynamic Contrastive Distillation for Image-Text Retrieval [90.05345397400144]
We present a novel plug-in dynamic contrastive distillation (DCD) framework to compress image-text retrieval models. We successfully apply our proposed DCD strategy to two state-of-the-art vision-language pretrained models, i.e. ViLT and METER. Experiments on MS-COCO and Flickr30K benchmarks show the effectiveness and efficiency of our DCD framework.
arXiv Detail & Related papers (2022-07-04T14:08:59Z)
Weakly-supervised High-fidelity Ultrasound Video Synthesis with Feature Decoupling [13.161739586288704]
In clinical practice, analysis and diagnosis often rely on US sequences rather than a single image to obtain dynamic anatomical information. This is challenging for novices to learn because practicing with adequate videos from patients is clinically unpractical. We propose a novel framework to synthesize high-fidelity US videos.
arXiv Detail & Related papers (2022-07-01T14:53:22Z)
i-Code: An Integrative and Composable Multimodal Learning Framework [99.56065789066027]
i-Code is a self-supervised pretraining framework where users may flexibly combine the modalities of vision, speech, and language into unified and general-purpose vector representations. The entire system is pretrained end-to-end with new objectives including masked modality unit modeling and cross-modality contrastive learning. Experimental results demonstrate how i-Code can outperform state-of-the-art techniques on five video understanding tasks and the GLUE NLP benchmark, improving by as much as 11%.
arXiv Detail & Related papers (2022-05-03T23:38:50Z)
One to Many: Adaptive Instrument Segmentation via Meta Learning and Dynamic Online Adaptation in Robotic Surgical Video [71.43912903508765]
MDAL is a dynamic online adaptive learning scheme for instrument segmentation in robot-assisted surgery. It learns the general knowledge of instruments and the fast adaptation ability through the video-specific meta-learning paradigm. It outperforms other state-of-the-art methods on two datasets.
arXiv Detail & Related papers (2021-03-24T05:02:18Z)
Echo-SyncNet: Self-supervised Cardiac View Synchronization in Echocardiography [11.407910072022018]
We propose Echo-Sync-Net, a self-supervised learning framework to synchronize various cross-of-care 2D echo series without any external input. We show promising results for synchronizing Apical 2 chamber and Apical 4 chamber cardiac views. We also show the usefulness of the learned representations in a one-shot learning scenario of cardiac detection.
arXiv Detail & Related papers (2021-02-03T20:48:16Z)
Unpaired Multi-modal Segmentation via Knowledge Distillation [77.39798870702174]
We propose a novel learning scheme for unpaired cross-modality image segmentation. In our method, we heavily reuse network parameters, by sharing all convolutional kernels across CT and MRI. We have extensively validated our approach on two multi-class segmentation problems.
arXiv Detail & Related papers (2020-01-06T20:03:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.