Leveraging Deepfakes to Close the Domain Gap between Real and Synthetic
Images in Facial Capture Pipelines
- URL: http://arxiv.org/abs/2204.10746v1
- Date: Fri, 22 Apr 2022 15:09:49 GMT
- Title: Leveraging Deepfakes to Close the Domain Gap between Real and Synthetic
Images in Facial Capture Pipelines
- Authors: Winnie Lin, Yilin Zhu, Demi Guo, Ron Fedkiw
- Abstract summary: We propose an end-to-end pipeline for building and tracking 3D facial models from personalized in-the-wild video data.
We present a method for automatic data curation and retrieval based on a hierarchical clustering framework typical of collision algorithms in traditional computer graphics pipelines.
We outline how we train a motion capture regressor, leveraging the aforementioned techniques to avoid the need for real-world ground truth data.
- Score: 8.366597450893456
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose an end-to-end pipeline for both building and tracking 3D facial
models from personalized in-the-wild (cellphone, webcam, youtube clips, etc.)
video data. First, we present a method for automatic data curation and
retrieval based on a hierarchical clustering framework typical of collision
detection algorithms in traditional computer graphics pipelines. Subsequently,
we utilize synthetic turntables and leverage deepfake technology in order to
build a synthetic multi-view stereo pipeline for appearance capture that is
robust to imperfect synthetic geometry and image misalignment. The resulting
model is fit with an animation rig, which is then used to track facial
performances. Notably, our novel use of deepfake technology enables us to
perform robust tracking of in-the-wild data using differentiable renderers
despite a significant synthetic-to-real domain gap. Finally, we outline how we
train a motion capture regressor, leveraging the aforementioned techniques to
avoid the need for real-world ground truth data and/or a high-end calibrated
camera capture setup.
Related papers
- Unsupervised Traffic Scene Generation with Synthetic 3D Scene Graphs [83.9783063609389]
We propose a method based on domain-invariant scene representation to directly synthesize traffic scene imagery without rendering.
Specifically, we rely on synthetic scene graphs as our internal representation and introduce an unsupervised neural network architecture for realistic traffic scene synthesis.
arXiv Detail & Related papers (2023-03-15T09:26:29Z) - Towards Real-World Video Deblurring by Exploring Blur Formation Process [53.91239555063343]
In recent years, deep learning-based approaches have achieved promising success on video deblurring task.
The models trained on existing synthetic datasets still suffer from generalization problems over real-world blurry scenarios.
We propose a novel realistic blur synthesis pipeline termed RAW-Blur by leveraging blur formation cues.
arXiv Detail & Related papers (2022-08-28T09:24:52Z) - Neural Scene Representation for Locomotion on Structured Terrain [56.48607865960868]
We propose a learning-based method to reconstruct the local terrain for a mobile robot traversing urban environments.
Using a stream of depth measurements from the onboard cameras and the robot's trajectory, the estimates the topography in the robot's vicinity.
We propose a 3D reconstruction model that faithfully reconstructs the scene, despite the noisy measurements and large amounts of missing data coming from the blind spots of the camera arrangement.
arXiv Detail & Related papers (2022-06-16T10:45:17Z) - Hands-Up: Leveraging Synthetic Data for Hands-On-Wheel Detection [0.38233569758620045]
This work demonstrates the use of synthetic photo-realistic in-cabin data to train a Driver Monitoring System.
We show how performing error analysis and generating the missing edge-cases in our platform boosts performance.
This showcases the ability of human-centric synthetic data to generalize well to the real world.
arXiv Detail & Related papers (2022-05-31T23:34:12Z) - Learning optical flow from still images [53.295332513139925]
We introduce a framework to generate accurate ground-truth optical flow annotations quickly and in large amounts from any readily available single real picture.
We virtually move the camera in the reconstructed environment with known motion vectors and rotation angles.
When trained with our data, state-of-the-art optical flow networks achieve superior generalization to unseen real data.
arXiv Detail & Related papers (2021-04-08T17:59:58Z) - Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image
Decomposition [67.9464567157846]
We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties.
Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-06-29T12:53:58Z) - Stillleben: Realistic Scene Synthesis for Deep Learning in Robotics [33.30312206728974]
We describe a synthesis pipeline capable of producing training data for cluttered scene perception tasks.
Our approach arranges object meshes in physically realistic, dense scenes using physics simulation.
Our pipeline can be run online during training of a deep neural network.
arXiv Detail & Related papers (2020-05-12T10:11:00Z) - Deep CG2Real: Synthetic-to-Real Translation via Image Disentanglement [78.58603635621591]
Training an unpaired synthetic-to-real translation network in image space is severely under-constrained.
We propose a semi-supervised approach that operates on the disentangled shading and albedo layers of the image.
Our two-stage pipeline first learns to predict accurate shading in a supervised fashion using physically-based renderings as targets.
arXiv Detail & Related papers (2020-03-27T21:45:41Z) - Virtual to Real adaptation of Pedestrian Detectors [9.432150710329607]
ViPeD is a new synthetically generated set of images collected with the graphical engine of the video game GTA V - Grand Theft Auto V.
We propose two different Domain Adaptation techniques suitable for the pedestrian detection task, but possibly applicable to general object detection.
Experiments show that the network trained with ViPeD can generalize over unseen real-world scenarios better than the detector trained over real-world data.
arXiv Detail & Related papers (2020-01-09T14:50:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.