Preface: A Data-driven Volumetric Prior for Few-shot Ultra
High-resolution Face Synthesis
- URL: http://arxiv.org/abs/2309.16859v1
- Date: Thu, 28 Sep 2023 21:21:44 GMT
- Title: Preface: A Data-driven Volumetric Prior for Few-shot Ultra
High-resolution Face Synthesis
- Authors: Marcel C. B\"uhler (1 and 2), Kripasindhu Sarkar (2), Tanmay Shah (2),
Gengyan Li (1 and 2), Daoye Wang (2), Leonhard Helminger (2), Sergio
Orts-Escolano (2), Dmitry Lagun (2), Otmar Hilliges (1), Thabo Beeler (2),
Abhimitra Meka (2) ((1) ETH Zurich, (2) Google)
- Abstract summary: NeRFs have enabled highly realistic synthesis of human faces including complex appearance and reflectance effects of hair and skin.
We propose a novel human face prior that enables the synthesis of ultra high-resolution novel views of subjects that are not part of the prior's training distribution.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: NeRFs have enabled highly realistic synthesis of human faces including
complex appearance and reflectance effects of hair and skin. These methods
typically require a large number of multi-view input images, making the process
hardware intensive and cumbersome, limiting applicability to unconstrained
settings. We propose a novel volumetric human face prior that enables the
synthesis of ultra high-resolution novel views of subjects that are not part of
the prior's training distribution. This prior model consists of an
identity-conditioned NeRF, trained on a dataset of low-resolution multi-view
images of diverse humans with known camera calibration. A simple sparse
landmark-based 3D alignment of the training dataset allows our model to learn a
smooth latent space of geometry and appearance despite a limited number of
training identities. A high-quality volumetric representation of a novel
subject can be obtained by model fitting to 2 or 3 camera views of arbitrary
resolution. Importantly, our method requires as few as two views of casually
captured images as input at inference time.
Related papers
- Cafca: High-quality Novel View Synthesis of Expressive Faces from Casual Few-shot Captures [33.463245327698]
We present a novel volumetric prior on human faces that allows for high-fidelity expressive face modeling.
We leverage a 3D Morphable Face Model to synthesize a large training set, rendering each identity with different expressions.
We then train a conditional Neural Radiance Field prior on this synthetic dataset and, at inference time, fine-tune the model on a very sparse set of real images of a single subject.
arXiv Detail & Related papers (2024-10-01T12:24:50Z) - SPARK: Self-supervised Personalized Real-time Monocular Face Capture [6.093606972415841]
Current state of the art approaches have the ability to regress parametric 3D face models in real-time across a wide range of identities.
We propose a method for high-precision 3D face capture taking advantage of a collection of unconstrained videos of a subject as prior information.
arXiv Detail & Related papers (2024-09-12T12:30:04Z) - Multi-Style Facial Sketch Synthesis through Masked Generative Modeling [17.313050611750413]
We propose a lightweight end-to-end synthesis model that efficiently converts images to corresponding multi-stylized sketches.
In this study, we overcome the issue of data insufficiency by incorporating semi-supervised learning into the training process.
Our method consistently outperforms previous algorithms across multiple benchmarks.
arXiv Detail & Related papers (2024-08-22T13:45:04Z) - Efficient-3DiM: Learning a Generalizable Single-image Novel-view
Synthesizer in One Day [63.96075838322437]
We propose a framework to learn a single-image novel-view synthesizer.
Our framework is able to reduce the total training time from 10 days to less than 1 day.
arXiv Detail & Related papers (2023-10-04T17:57:07Z) - Deformable Model-Driven Neural Rendering for High-Fidelity 3D
Reconstruction of Human Heads Under Low-View Settings [20.07788905506271]
Reconstructing 3D human heads in low-view settings presents technical challenges.
We propose geometry decomposition and adopt a two-stage, coarse-to-fine training strategy.
Our method outperforms existing neural rendering approaches in terms of reconstruction accuracy and novel view synthesis under low-view settings.
arXiv Detail & Related papers (2023-03-24T08:32:00Z) - Generalizable Neural Performer: Learning Robust Radiance Fields for
Human Novel View Synthesis [52.720314035084215]
This work targets at using a general deep learning framework to synthesize free-viewpoint images of arbitrary human performers.
We present a simple yet powerful framework, named Generalizable Neural Performer (GNR), that learns a generalizable and robust neural body representation.
Experiments on GeneBody-1.0 and ZJU-Mocap show better robustness of our methods than recent state-of-the-art generalizable methods.
arXiv Detail & Related papers (2022-04-25T17:14:22Z) - LiP-Flow: Learning Inference-time Priors for Codec Avatars via
Normalizing Flows in Latent Space [90.74976459491303]
We introduce a prior model that is conditioned on the runtime inputs and tie this prior space to the 3D face model via a normalizing flow in the latent space.
A normalizing flow bridges the two representation spaces and transforms latent samples from one domain to another, allowing us to define a latent likelihood objective.
We show that our approach leads to an expressive and effective prior, capturing facial dynamics and subtle expressions better.
arXiv Detail & Related papers (2022-03-15T13:22:57Z) - DeepMultiCap: Performance Capture of Multiple Characters Using Sparse
Multiview Cameras [63.186486240525554]
DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras.
Our method can capture time varying surface details without the need of using pre-scanned template models.
arXiv Detail & Related papers (2021-05-01T14:32:13Z) - PVA: Pixel-aligned Volumetric Avatars [34.929560973779466]
We devise a novel approach for predicting volumetric avatars of the human head given just a small number of inputs.
Our approach is trained in an end-to-end manner solely based on a photometric re-rendering loss without requiring explicit 3D supervision.
arXiv Detail & Related papers (2021-01-07T18:58:46Z) - PaMIR: Parametric Model-Conditioned Implicit Representation for
Image-based Human Reconstruction [67.08350202974434]
We propose Parametric Model-Conditioned Implicit Representation (PaMIR), which combines the parametric body model with the free-form deep implicit function.
We show that our method achieves state-of-the-art performance for image-based 3D human reconstruction in the cases of challenging poses and clothing types.
arXiv Detail & Related papers (2020-07-08T02:26:19Z) - Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the
Wild [101.70320427145388]
We propose a weakly-supervised approach that does not require 3D annotations and learns to estimate 3D poses from unlabeled multi-view data.
We evaluate our proposed approach on two large scale datasets.
arXiv Detail & Related papers (2020-03-17T08:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.