3D GAN Inversion with Pose Optimization
- URL: http://arxiv.org/abs/2210.07301v2
- Date: Mon, 17 Oct 2022 12:53:11 GMT
- Title: 3D GAN Inversion with Pose Optimization
- Authors: Jaehoon Ko, Kyusun Cho, Daewon Choi, Kwangrok Ryoo, Seungryong Kim
- Abstract summary: We introduce a generalizable 3D GAN inversion method that infers camera viewpoint and latent code simultaneously to enable multi-view consistent semantic image editing.
We conduct extensive experiments on image reconstruction and editing both quantitatively and qualitatively, and further compare our results with 2D GAN-based editing.
- Score: 26.140281977885376
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the recent advances in NeRF-based 3D aware GANs quality, projecting an
image into the latent space of these 3D-aware GANs has a natural advantage over
2D GAN inversion: not only does it allow multi-view consistent editing of the
projected image, but it also enables 3D reconstruction and novel view synthesis
when given only a single image. However, the explicit viewpoint control acts as
a main hindrance in the 3D GAN inversion process, as both camera pose and
latent code have to be optimized simultaneously to reconstruct the given image.
Most works that explore the latent space of the 3D-aware GANs rely on
ground-truth camera viewpoint or deformable 3D model, thus limiting their
applicability. In this work, we introduce a generalizable 3D GAN inversion
method that infers camera viewpoint and latent code simultaneously to enable
multi-view consistent semantic image editing. The key to our approach is to
leverage pre-trained estimators for better initialization and utilize the
pixel-wise depth calculated from NeRF parameters to better reconstruct the
given image. We conduct extensive experiments on image reconstruction and
editing both quantitatively and qualitatively, and further compare our results
with 2D GAN-based editing to demonstrate the advantages of utilizing the latent
space of 3D GANs. Additional results and visualizations are available at
https://3dgan-inversion.github.io .
Related papers
- TriPlaneNet: An Encoder for EG3D Inversion [1.9567015559455132]
NeRF-based GANs have introduced a number of approaches for high-resolution and high-fidelity generative modeling of human heads.
Despite the success of universal optimization-based methods for 2D GAN inversion, those applied to 3D GANs may fail to extrapolate the result onto the novel view.
We introduce a fast technique that bridges the gap between the two approaches by directly utilizing the tri-plane representation presented for the EG3D generative model.
arXiv Detail & Related papers (2023-03-23T17:56:20Z) - NeRF-GAN Distillation for Efficient 3D-Aware Generation with
Convolutions [97.27105725738016]
integration of Neural Radiance Fields (NeRFs) and generative models, such as Generative Adversarial Networks (GANs) has transformed 3D-aware generation from single-view images.
We propose a simple and effective method, based on re-using the well-disentangled latent space of a pre-trained NeRF-GAN in a pose-conditioned convolutional network to directly generate 3D-consistent images corresponding to the underlying 3D representations.
arXiv Detail & Related papers (2023-03-22T18:59:48Z) - Make Encoder Great Again in 3D GAN Inversion through Geometry and
Occlusion-Aware Encoding [25.86312557482366]
3D GAN inversion aims to achieve high reconstruction fidelity and reasonable 3D geometry simultaneously from a single image input.
We introduce a novel encoder-based inversion framework based on EG3D, one of the most widely-used 3D GAN models.
Our method achieves impressive results comparable to optimization-based methods while operating up to 500 times faster.
arXiv Detail & Related papers (2023-03-22T05:51:53Z) - In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face Editing [28.790900756506833]
3D-aware GANs offer new capabilities for view synthesis while preserving the editing functionalities of their 2D counterparts.
GAN inversion is a crucial step that seeks the latent code to reconstruct input images or videos, subsequently enabling diverse editing tasks through manipulation of this latent code.
We address this issue by explicitly modeling OOD objects from the input in 3D-aware GANs.
arXiv Detail & Related papers (2023-02-09T18:59:56Z) - Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion [115.82306502822412]
StyleGAN has achieved great progress in 2D face reconstruction and semantic editing via image inversion and latent editing.
A corresponding generic 3D GAN inversion framework is still missing, limiting the applications of 3D face reconstruction and semantic editing.
We study the challenging problem of 3D GAN inversion where a latent code is predicted given a single face image to faithfully recover its 3D shapes and detailed textures.
arXiv Detail & Related papers (2022-12-14T18:49:50Z) - 3D GAN Inversion with Facial Symmetry Prior [42.22071135018402]
It is natural to associate 3D GANs with GAN inversion methods to project a real image into the generator's latent space.
We propose a novel method to promote 3D GAN inversion by introducing facial symmetry prior.
arXiv Detail & Related papers (2022-11-30T11:57:45Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z) - Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator [68.0533826852601]
3D-aware image synthesis aims at learning a generative model that can render photo-realistic 2D images while capturing decent underlying 3D shapes.
Existing methods fail to obtain moderate 3D shapes.
We propose a geometry-aware discriminator to improve 3D-aware GANs.
arXiv Detail & Related papers (2022-09-30T17:59:37Z) - 3D-FM GAN: Towards 3D-Controllable Face Manipulation [43.99393180444706]
3D-FM GAN is a novel conditional GAN framework designed specifically for 3D-controllable face manipulation.
By carefully encoding both the input face image and a physically-based rendering of 3D edits into a StyleGAN's latent spaces, our image generator provides high-quality, identity-preserved, 3D-controllable face manipulation.
We show that our method outperforms the prior arts on various tasks, with better editability, stronger identity preservation, and higher photo-realism.
arXiv Detail & Related papers (2022-08-24T01:33:13Z) - GAN2X: Non-Lambertian Inverse Rendering of Image GANs [85.76426471872855]
We present GAN2X, a new method for unsupervised inverse rendering that only uses unpaired images for training.
Unlike previous Shape-from-GAN approaches that mainly focus on 3D shapes, we take the first attempt to also recover non-Lambertian material properties by exploiting the pseudo paired data generated by a GAN.
Experiments demonstrate that GAN2X can accurately decompose 2D images to 3D shape, albedo, and specular properties for different object categories, and achieves the state-of-the-art performance for unsupervised single-view 3D face reconstruction.
arXiv Detail & Related papers (2022-06-18T16:58:49Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.