Related papers: Weakly-supervised High-fidelity Ultrasound Video Synthesis with Feature Decoupling

Weakly-supervised High-fidelity Ultrasound Video Synthesis with Feature Decoupling

URL: http://arxiv.org/abs/2207.00474v1
Date: Fri, 1 Jul 2022 14:53:22 GMT
Title: Weakly-supervised High-fidelity Ultrasound Video Synthesis with Feature Decoupling
Authors: Jiamin Liang, Xin Yang, Yuhao Huang, Kai Liu, Xinrui Zhou, Xindi Hu, Zehui Lin, Huanjia Luo, Yuanji Zhang, Yi Xiong, Dong Ni
Abstract summary: In clinical practice, analysis and diagnosis often rely on US sequences rather than a single image to obtain dynamic anatomical information. This is challenging for novices to learn because practicing with adequate videos from patients is clinically unpractical. We propose a novel framework to synthesize high-fidelity US videos.
Score: 13.161739586288704
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Ultrasound (US) is widely used for its advantages of real-time imaging, radiation-free and portability. In clinical practice, analysis and diagnosis often rely on US sequences rather than a single image to obtain dynamic anatomical information. This is challenging for novices to learn because practicing with adequate videos from patients is clinically unpractical. In this paper, we propose a novel framework to synthesize high-fidelity US videos. Specifically, the synthesis videos are generated by animating source content images based on the motion of given driving videos. Our highlights are three-fold. First, leveraging the advantages of self- and fully-supervised learning, our proposed system is trained in weakly-supervised manner for keypoint detection. These keypoints then provide vital information for handling complex high dynamic motions in US videos. Second, we decouple content and texture learning using the dual decoders to effectively reduce the model learning difficulty. Last, we adopt the adversarial training strategy with GAN losses for further improving the sharpness of the generated videos, narrowing the gap between real and synthesis videos. We validate our method on a large in-house pelvic dataset with high dynamic motion. Extensive evaluation metrics and user study prove the effectiveness of our proposed method.

Related papers

Ultrasound Image-to-Video Synthesis via Latent Dynamic Diffusion Models [17.949823366019285]
We propose synthesizing plausible ultrasound videos from readily available, abundant ultrasound images. We demonstrate strong quantitative results and visually appealing synthesized videos on the BUSV benchmark. Our image-to-video approach provides an effective data augmentation solution to advance ultrasound video analysis.
arXiv Detail & Related papers (2025-03-19T07:58:43Z)
Data Collection-free Masked Video Modeling [6.641717260925999]
We introduce an effective self-supervised learning framework for videos that leverages and less costly static images. These pseudo-motion videos are then leveraged in masked video modeling. Our approach is applicable to synthetic images as well, thus entirely freeing video-training from data collection costs other concerns in real data.
arXiv Detail & Related papers (2024-09-10T17:34:07Z)
OnUVS: Online Feature Decoupling Framework for High-Fidelity Ultrasound Video Synthesis [34.07625938756013]
Sonographers must observe corresponding dynamic anatomic structures to gather comprehensive information. The synthesis of US videos may represent a promising solution to this issue. We present a novel online feature-decoupling framework called OnUVS for high-fidelity US video synthesis.
arXiv Detail & Related papers (2023-08-16T10:16:50Z)
Leaping Into Memories: Space-Time Deep Feature Synthesis [93.10032043225362]
We propose LEAPS, an architecture-independent method for synthesizing videos from internal models. We quantitatively and qualitatively evaluate the applicability of LEAPS by inverting a range of architectures convolutional attention-based on Kinetics-400.
arXiv Detail & Related papers (2023-03-17T12:55:22Z)
InternVideo: General Video Foundation Models via Generative and Discriminative Learning [52.69422763715118]
We present general video foundation models, InternVideo, for dynamic and complex video-level understanding tasks. InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives. InternVideo achieves state-of-the-art performance on 39 video datasets from extensive tasks including video action recognition/detection, video-language alignment, and open-world video applications.
arXiv Detail & Related papers (2022-12-06T18:09:49Z)
Relational Graph Learning on Visual and Kinematics Embeddings for Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information. The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z)
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning [100.76672109782815]
We study unsupervised video representation learning that seeks to learn both motion and appearance features from unlabeled video only. It is difficult to construct a suitable self-supervised task to well model both motion and appearance features. We propose a new way to perceive the playback speed and exploit the relative speed between two video clips as labels.
arXiv Detail & Related papers (2020-10-27T16:42:50Z)
Self-supervised Contrastive Video-Speech Representation Learning for Ultrasound [15.517484333872277]
In medical imaging, manual annotations can be expensive to acquire and sometimes infeasible to access. We propose to address the problem of self-supervised representation learning with multi-modal ultrasound video-speech raw data.
arXiv Detail & Related papers (2020-08-14T23:58:23Z)
Self-supervised Video Representation Learning by Pace Prediction [48.029602040786685]
This paper addresses the problem of self-supervised video representation learning from a new perspective -- by video pace prediction. It stems from the observation that human visual system is sensitive to video pace. We randomly sample training clips in different paces and ask a neural network to identify the pace for each video clip.
arXiv Detail & Related papers (2020-08-13T12:40:24Z)
Non-Adversarial Video Synthesis with Learned Priors [53.26777815740381]
We focus on the problem of generating videos from latent noise vectors, without any reference input frames. We develop a novel approach that jointly optimize the input latent space, the weights of a recurrent neural network and a generator through non-adversarial learning. Our approach generates superior quality videos compared to the existing state-of-the-art methods.
arXiv Detail & Related papers (2020-03-21T02:57:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.