OnUVS: Online Feature Decoupling Framework for High-Fidelity Ultrasound
Video Synthesis
- URL: http://arxiv.org/abs/2308.08269v1
- Date: Wed, 16 Aug 2023 10:16:50 GMT
- Title: OnUVS: Online Feature Decoupling Framework for High-Fidelity Ultrasound
Video Synthesis
- Authors: Han Zhou, Dong Ni, Ao Chang, Xinrui Zhou, Rusi Chen, Yanlin Chen, Lian
Liu, Jiamin Liang, Yuhao Huang, Tong Han, Zhe Liu, Deng-Ping Fan, Xin Yang
- Abstract summary: Sonographers must observe corresponding dynamic anatomic structures to gather comprehensive information.
The synthesis of US videos may represent a promising solution to this issue.
We present a novel online feature-decoupling framework called OnUVS for high-fidelity US video synthesis.
- Score: 34.07625938756013
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ultrasound (US) imaging is indispensable in clinical practice. To diagnose
certain diseases, sonographers must observe corresponding dynamic anatomic
structures to gather comprehensive information. However, the limited
availability of specific US video cases causes teaching difficulties in
identifying corresponding diseases, which potentially impacts the detection
rate of such cases. The synthesis of US videos may represent a promising
solution to this issue. Nevertheless, it is challenging to accurately animate
the intricate motion of dynamic anatomic structures while preserving image
fidelity. To address this, we present a novel online feature-decoupling
framework called OnUVS for high-fidelity US video synthesis. Our highlights can
be summarized by four aspects. First, we introduced anatomic information into
keypoint learning through a weakly-supervised training strategy, resulting in
improved preservation of anatomical integrity and motion while minimizing the
labeling burden. Second, to better preserve the integrity and textural
information of US images, we implemented a dual-decoder that decouples the
content and textural features in the generator. Third, we adopted a
multiple-feature discriminator to extract a comprehensive range of visual cues,
thereby enhancing the sharpness and fine details of the generated videos.
Fourth, we constrained the motion trajectories of keypoints during online
learning to enhance the fluidity of generated videos. Our validation and user
studies on in-house echocardiographic and pelvic floor US videos showed that
OnUVS synthesizes US videos with high fidelity.
Related papers
- TASL-Net: Tri-Attention Selective Learning Network for Intelligent Diagnosis of Bimodal Ultrasound Video [10.087796410298061]
This paper proposes a novel Tri-Attention Selective Learning Network (TASL-Net) to tackle this challenge.
TASL-Net embeds three types of diagnostic attention of sonographers into a mutual transformer framework for intelligent diagnosis of bimodal ultrasound videos.
We conduct a detailed experimental validation of TASL-Net's performance on three datasets, including lung, breast, and liver.
arXiv Detail & Related papers (2024-09-03T02:50:37Z) - FLex: Joint Pose and Dynamic Radiance Fields Optimization for Stereo Endoscopic Videos [79.50191812646125]
Reconstruction of endoscopic scenes is an important asset for various medical applications, from post-surgery analysis to educational training.
We adress the challenging setup of a moving endoscope within a highly dynamic environment of deforming tissue.
We propose an implicit scene separation into multiple overlapping 4D neural radiance fields (NeRFs) and a progressive optimization scheme jointly optimizing for reconstruction and camera poses from scratch.
This improves the ease-of-use and allows to scale reconstruction capabilities in time to process surgical videos of 5,000 frames and more; an improvement of more than ten times compared to the state of the art while being agnostic to external tracking information
arXiv Detail & Related papers (2024-03-18T19:13:02Z) - Diffusion Priors for Dynamic View Synthesis from Monocular Videos [59.42406064983643]
Dynamic novel view synthesis aims to capture the temporal evolution of visual content within videos.
We first finetune a pretrained RGB-D diffusion model on the video frames using a customization technique.
We distill the knowledge from the finetuned model to a 4D representations encompassing both dynamic and static Neural Radiance Fields.
arXiv Detail & Related papers (2024-01-10T23:26:41Z) - Echocardiography video synthesis from end diastolic semantic map via
diffusion model [0.0]
This paper aims to tackle the challenges by expanding upon existing video diffusion models for the purpose of cardiac video synthesis.
Our focus lies in generating video using semantic maps of the initial frame during the cardiac cycle, commonly referred to as end diastole.
Our model exhibits better performance compared to the standard diffusion technique in terms of multiple metrics, including FID, FVD, and SSMI.
arXiv Detail & Related papers (2023-10-11T02:08:05Z) - Weakly-supervised High-fidelity Ultrasound Video Synthesis with Feature
Decoupling [13.161739586288704]
In clinical practice, analysis and diagnosis often rely on US sequences rather than a single image to obtain dynamic anatomical information.
This is challenging for novices to learn because practicing with adequate videos from patients is clinically unpractical.
We propose a novel framework to synthesize high-fidelity US videos.
arXiv Detail & Related papers (2022-07-01T14:53:22Z) - A New Dataset and A Baseline Model for Breast Lesion Detection in
Ultrasound Videos [43.42513012531214]
We first collect and annotate an ultrasound video dataset (188 videos) for breast lesion detection.
We propose a clip-level and video-level feature aggregated network (CVA-Net) for addressing breast lesion detection in ultrasound videos.
arXiv Detail & Related papers (2022-07-01T01:37:50Z) - Voice-assisted Image Labelling for Endoscopic Ultrasound Classification
using Neural Networks [48.732863591145964]
We propose a multi-modal convolutional neural network architecture that labels endoscopic ultrasound (EUS) images from raw verbal comments provided by a clinician during the procedure.
Our results show a prediction accuracy of 76% at image level on a dataset with 5 different labels.
arXiv Detail & Related papers (2021-10-12T21:22:24Z) - Unsupervised multi-latent space reinforcement learning framework for
video summarization in ultrasound imaging [0.0]
The COVID-19 pandemic has highlighted the need for a tool to speed up triage in ultrasound scans.
The proposed video-summarization technique is a step in this direction.
We propose a new unsupervised reinforcement learning framework with novel rewards.
arXiv Detail & Related papers (2021-09-03T04:50:35Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Non-Adversarial Video Synthesis with Learned Priors [53.26777815740381]
We focus on the problem of generating videos from latent noise vectors, without any reference input frames.
We develop a novel approach that jointly optimize the input latent space, the weights of a recurrent neural network and a generator through non-adversarial learning.
Our approach generates superior quality videos compared to the existing state-of-the-art methods.
arXiv Detail & Related papers (2020-03-21T02:57:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.