Learned Spatial Representations for Few-shot Talking-Head Synthesis
- URL: http://arxiv.org/abs/2104.14557v1
- Date: Thu, 29 Apr 2021 17:59:42 GMT
- Title: Learned Spatial Representations for Few-shot Talking-Head Synthesis
- Authors: Moustafa Meshry, Saksham Suri, Larry S. Davis, Abhinav Shrivastava
- Abstract summary: We propose a novel approach for few-shot talking-head synthesis.
We show that this disentangled representation leads to a significant improvement over previous methods.
- Score: 68.3787368024951
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel approach for few-shot talking-head synthesis. While recent
works in neural talking heads have produced promising results, they can still
produce images that do not preserve the identity of the subject in source
images. We posit this is a result of the entangled representation of each
subject in a single latent code that models 3D shape information, identity
cues, colors, lighting and even background details. In contrast, we propose to
factorize the representation of a subject into its spatial and style
components. Our method generates a target frame in two steps. First, it
predicts a dense spatial layout for the target image. Second, an image
generator utilizes the predicted layout for spatial denormalization and
synthesizes the target frame. We experimentally show that this disentangled
representation leads to a significant improvement over previous methods, both
quantitatively and qualitatively.
Related papers
- Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis [60.260724486834164]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries.
We present two key innovations: Vision Guidance and the Layered Rendering Diffusion framework.
We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing.
arXiv Detail & Related papers (2023-11-30T10:36:19Z) - Generalizable One-shot Neural Head Avatar [90.50492165284724]
We present a method that reconstructs and animates a 3D head avatar from a single-view portrait image.
We propose a framework that not only generalizes to unseen identities based on a single-view image, but also captures characteristic details within and beyond the face area.
arXiv Detail & Related papers (2023-06-14T22:33:09Z) - Zero-1-to-3: Zero-shot One Image to 3D Object [30.455300183998247]
We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image.
Our conditional diffusion model uses a synthetic dataset to learn controls of the relative camera viewpoint.
Our method significantly outperforms state-of-the-art single-view 3D reconstruction and novel view synthesis models by leveraging Internet-scale pre-training.
arXiv Detail & Related papers (2023-03-20T17:59:50Z) - Global Context-Aware Person Image Generation [24.317541784957285]
We propose a data-driven approach for context-aware person image generation.
In our method, the position, scale, and appearance of the generated person are semantically conditioned on the existing persons in the scene.
arXiv Detail & Related papers (2023-02-28T16:34:55Z) - Shape, Pose, and Appearance from a Single Image via Bootstrapped
Radiance Field Inversion [54.151979979158085]
We introduce a principled end-to-end reconstruction framework for natural images, where accurate ground-truth poses are not available.
We leverage an unconditional 3D-aware generator, to which we apply a hybrid inversion scheme where a model produces a first guess of the solution.
Our framework can de-render an image in as few as 10 steps, enabling its use in practical scenarios.
arXiv Detail & Related papers (2022-11-21T17:42:42Z) - Coordinates Are NOT Lonely -- Codebook Prior Helps Implicit Neural 3D
Representations [29.756718435405983]
Implicit neural 3D representation has achieved impressive results in surface or scene reconstruction and novel view synthesis.
Existing approaches, such as Neural Radiance Field (NeRF) and its variants, usually require dense input views.
We introduce a novel coordinate-based model, CoCo-INR, for implicit neural 3D representation.
arXiv Detail & Related papers (2022-10-20T11:13:50Z) - PeRFception: Perception using Radiance Fields [72.99583614735545]
We create the first large-scale implicit representation datasets for perception tasks, called the PeRFception.
It shows a significant memory compression rate (96.4%) from the original dataset, while containing both 2D and 3D information in a unified form.
We construct the classification and segmentation models that directly take as input this implicit format and also propose a novel augmentation technique to avoid overfitting on backgrounds of images.
arXiv Detail & Related papers (2022-08-24T13:32:46Z) - OptGAN: Optimizing and Interpreting the Latent Space of the Conditional
Text-to-Image GANs [8.26410341981427]
We study how to ensure that generated samples are believable, realistic or natural.
We present a novel algorithm which identifies semantically-understandable directions in the latent space of a conditional text-to-image GAN architecture.
arXiv Detail & Related papers (2022-02-25T20:00:33Z) - Shelf-Supervised Mesh Prediction in the Wild [54.01373263260449]
We propose a learning-based approach to infer 3D shape and pose of object from a single image.
We first infer a volumetric representation in a canonical frame, along with the camera pose.
The coarse volumetric prediction is then converted to a mesh-based representation, which is further refined in the predicted camera frame.
arXiv Detail & Related papers (2021-02-11T18:57:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.