Long-Term Temporally Consistent Unpaired Video Translation from
Simulated Surgical 3D Data
- URL: http://arxiv.org/abs/2103.17204v1
- Date: Wed, 31 Mar 2021 16:31:26 GMT
- Title: Long-Term Temporally Consistent Unpaired Video Translation from
Simulated Surgical 3D Data
- Authors: Dominik Rivoir, Micha Pfeiffer, Reuben Docea, Fiona Kolbinger, Carina
Riediger, J\"urgen Weitz, Stefanie Speidel
- Abstract summary: We propose a novel approach which combines unpaired image translation with neural rendering to transfer simulated to photorealistic surgical abdominal scenes.
By introducing global learnable textures and a lighting-invariant view-consistency loss, our method produces consistent translations of arbitrary views.
By extending existing image-based methods to view-consistent videos, we aim to impact the applicability of simulated training and evaluation environments for surgical applications.
- Score: 0.059110875077162096
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Research in unpaired video translation has mainly focused on short-term
temporal consistency by conditioning on neighboring frames. However for
transfer from simulated to photorealistic sequences, available information on
the underlying geometry offers potential for achieving global consistency
across views. We propose a novel approach which combines unpaired image
translation with neural rendering to transfer simulated to photorealistic
surgical abdominal scenes. By introducing global learnable textures and a
lighting-invariant view-consistency loss, our method produces consistent
translations of arbitrary views and thus enables long-term consistent video
synthesis. We design and test our model to generate video sequences from
minimally-invasive surgical abdominal scenes. Because labeled data is often
limited in this domain, photorealistic data where ground truth information from
the simulated domain is preserved is especially relevant. By extending existing
image-based methods to view-consistent videos, we aim to impact the
applicability of simulated training and evaluation environments for surgical
applications. Code and data will be made publicly available soon.
Related papers
- MeshBrush: Painting the Anatomical Mesh with Neural Stylization for Endoscopy [0.8437187555622164]
Style transfer is a promising approach to close the sim-to-real gap in medical endoscopy.
rendering synthetic endoscopic videos by traversing pre-operative scans can generate structurally accurate simulations.
CycleGAN can imitate realistic endoscopic images from these simulations, but they are unsuitable for video-to-video synthesis.
We propose MeshBrush, a neural mesh stylization method to synthesize temporally consistent videos.
arXiv Detail & Related papers (2024-04-03T18:40:48Z) - Do You Guys Want to Dance: Zero-Shot Compositional Human Dance
Generation with Multiple Persons [73.21855272778616]
We introduce a new task, dataset, and evaluation protocol of compositional human dance generation (cHDG)
We propose a novel zero-shot framework, dubbed MultiDance-Zero, that can synthesize videos consistent with arbitrary multiple persons and background while precisely following the driving poses.
arXiv Detail & Related papers (2024-01-24T10:44:16Z) - RIGID: Recurrent GAN Inversion and Editing of Real Face Videos [73.97520691413006]
GAN inversion is indispensable for applying the powerful editability of GAN to real images.
Existing methods invert video frames individually often leading to undesired inconsistent results over time.
We propose a unified recurrent framework, named textbfRecurrent vtextbfIdeo textbfGAN textbfInversion and etextbfDiting (RIGID)
Our framework learns the inherent coherence between input frames in an end-to-end manner.
arXiv Detail & Related papers (2023-08-11T12:17:24Z) - Joint one-sided synthetic unpaired image translation and segmentation
for colorectal cancer prevention [16.356954231068077]
We produce realistic synthetic images using a combination of 3D technologies and generative adversarial networks.
We propose CUT-seg, a joint training where a segmentation model and a generative model are jointly trained to produce realistic images.
As a part of this study we release Synth-Colon, an entirely synthetic dataset that includes 20000 realistic colon images.
arXiv Detail & Related papers (2023-07-20T22:09:04Z) - Translating Simulation Images to X-ray Images via Multi-Scale Semantic
Matching [16.175115921436582]
We propose a new method to translate simulation images from an endovascular simulator to X-ray images.
We apply self-domain semantic matching to ensure that the input image and the generated image have the same positional semantic relationships.
Our method generates realistic X-ray images and outperforms other state-of-the-art approaches by a large margin.
arXiv Detail & Related papers (2023-04-16T04:49:46Z) - Synthetic-to-Real Domain Adaptation using Contrastive Unpaired
Translation [28.19031441659854]
We propose a multi-step method to obtain training data without manual annotation effort.
From 3D object meshes, we generate images using a modern synthesis pipeline.
We utilize a state-of-the-art image-to-image translation method to adapt the synthetic images to the real domain.
arXiv Detail & Related papers (2022-03-17T17:13:23Z) - A Shared Representation for Photorealistic Driving Simulators [83.5985178314263]
We propose to improve the quality of generated images by rethinking the discriminator architecture.
The focus is on the class of problems where images are generated given semantic inputs, such as scene segmentation maps or human body poses.
We aim to learn a shared latent representation that encodes enough information to jointly do semantic segmentation, content reconstruction, along with a coarse-to-fine grained adversarial reasoning.
arXiv Detail & Related papers (2021-12-09T18:59:21Z) - Learning optical flow from still images [53.295332513139925]
We introduce a framework to generate accurate ground-truth optical flow annotations quickly and in large amounts from any readily available single real picture.
We virtually move the camera in the reconstructed environment with known motion vectors and rotation angles.
When trained with our data, state-of-the-art optical flow networks achieve superior generalization to unseen real data.
arXiv Detail & Related papers (2021-04-08T17:59:58Z) - Non-Rigid Neural Radiance Fields: Reconstruction and Novel View
Synthesis of a Dynamic Scene From Monocular Video [76.19076002661157]
Non-Rigid Neural Radiance Fields (NR-NeRF) is a reconstruction and novel view synthesis approach for general non-rigid dynamic scenes.
We show that even a single consumer-grade camera is sufficient to synthesize sophisticated renderings of a dynamic scene from novel virtual camera views.
arXiv Detail & Related papers (2020-12-22T18:46:12Z) - Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image
Decomposition [67.9464567157846]
We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties.
Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-06-29T12:53:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.