High Resolution Zero-Shot Domain Adaptation of Synthetically Rendered
Face Images
- URL: http://arxiv.org/abs/2006.15031v1
- Date: Fri, 26 Jun 2020 15:00:04 GMT
- Title: High Resolution Zero-Shot Domain Adaptation of Synthetically Rendered
Face Images
- Authors: Stephan J. Garbin, Marek Kowalski, Matthew Johnson, and Jamie Shotton
- Abstract summary: We propose an algorithm that matches a non-photorealistic, synthetically generated image to a latent vector of a pretrained StyleGAN2 model.
In contrast to most previous work, we require no synthetic training data.
This is the first algorithm of its kind to work at a resolution of 1K and represents a significant leap forward in visual realism.
- Score: 10.03187850132035
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating photorealistic images of human faces at scale remains a
prohibitively difficult task using computer graphics approaches. This is
because these require the simulation of light to be photorealistic, which in
turn requires physically accurate modelling of geometry, materials, and light
sources, for both the head and the surrounding scene. Non-photorealistic
renders however are increasingly easy to produce. In contrast to computer
graphics approaches, generative models learned from more readily available 2D
image data have been shown to produce samples of human faces that are hard to
distinguish from real data. The process of learning usually corresponds to a
loss of control over the shape and appearance of the generated images. For
instance, even simple disentangling tasks such as modifying the hair
independently of the face, which is trivial to accomplish in a computer
graphics approach, remains an open research question. In this work, we propose
an algorithm that matches a non-photorealistic, synthetically generated image
to a latent vector of a pretrained StyleGAN2 model which, in turn, maps the
vector to a photorealistic image of a person of the same pose, expression,
hair, and lighting. In contrast to most previous work, we require no synthetic
training data. To the best of our knowledge, this is the first algorithm of its
kind to work at a resolution of 1K and represents a significant leap forward in
visual realism.
Related papers
- Toward Human Understanding with Controllable Synthesis [3.6579002555961915]
Training methods to perform robust 3D human pose and shape estimation require diverse training images with accurate ground truth.
While BEDLAM demonstrates the potential of traditional procedural graphics to generate such data, the training images are clearly synthetic.
In contrast, generative image models produce highly realistic images but without ground truth.
arXiv Detail & Related papers (2024-11-13T14:54:47Z) - GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations [54.94362657501809]
We propose a new method to generate highly dynamic and deformable human head avatars from multi-view imagery in real-time.
At the core of our method is a hierarchical representation of head models that allows to capture the complex dynamics of facial expressions and head movements.
We train this coarse-to-fine facial avatar model along with the head pose as a learnable parameter in an end-to-end framework.
arXiv Detail & Related papers (2024-09-18T13:05:43Z) - Single-Shot Implicit Morphable Faces with Consistent Texture
Parameterization [91.52882218901627]
We propose a novel method for constructing implicit 3D morphable face models that are both generalizable and intuitive for editing.
Our method improves upon photo-realism, geometry, and expression accuracy compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-04T17:58:40Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z) - Photorealism in Driving Simulations: Blending Generative Adversarial
Image Synthesis with Rendering [0.0]
We introduce a hybrid generative neural graphics pipeline for improving the visual fidelity of driving simulations.
We form 2D semantic images from 3D scenery consisting of simple object models without textures.
These semantic images are then converted into photorealistic RGB images with a state-of-the-art Generative Adrial Network (GAN) trained on real-world driving scenes.
arXiv Detail & Related papers (2020-07-31T03:25:17Z) - Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image
Decomposition [67.9464567157846]
We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties.
Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-06-29T12:53:58Z) - Learning Neural Light Transport [28.9247002210861]
We present an approach for learning light transport in static and dynamic 3D scenes using a neural network.
We find that our model is able to produce photorealistic renderings of static and dynamic scenes.
arXiv Detail & Related papers (2020-06-05T13:26:05Z) - State of the Art on Neural Rendering [141.22760314536438]
We focus on approaches that combine classic computer graphics techniques with deep generative models to obtain controllable and photo-realistic outputs.
This report is focused on the many important use cases for the described algorithms such as novel view synthesis, semantic photo manipulation, facial and body reenactment, relighting, free-viewpoint video, and the creation of photo-realistic avatars for virtual and augmented reality telepresence.
arXiv Detail & Related papers (2020-04-08T04:36:31Z) - Learning Inverse Rendering of Faces from Real-world Videos [52.313931830408386]
Existing methods decompose a face image into three components (albedo, normal, and illumination) by supervised training on synthetic data.
We propose a weakly supervised training approach to train our model on real face videos, based on the assumption of consistency of albedo and normal.
Our network is trained on both real and synthetic data, benefiting from both.
arXiv Detail & Related papers (2020-03-26T17:26:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.