Make Encoder Great Again in 3D GAN Inversion through Geometry and
Occlusion-Aware Encoding
- URL: http://arxiv.org/abs/2303.12326v1
- Date: Wed, 22 Mar 2023 05:51:53 GMT
- Title: Make Encoder Great Again in 3D GAN Inversion through Geometry and
Occlusion-Aware Encoding
- Authors: Ziyang Yuan, Yiming Zhu, Yu Li, Hongyu Liu, Chun Yuan
- Abstract summary: 3D GAN inversion aims to achieve high reconstruction fidelity and reasonable 3D geometry simultaneously from a single image input.
We introduce a novel encoder-based inversion framework based on EG3D, one of the most widely-used 3D GAN models.
Our method achieves impressive results comparable to optimization-based methods while operating up to 500 times faster.
- Score: 25.86312557482366
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: 3D GAN inversion aims to achieve high reconstruction fidelity and reasonable
3D geometry simultaneously from a single image input. However, existing 3D GAN
inversion methods rely on time-consuming optimization for each individual case.
In this work, we introduce a novel encoder-based inversion framework based on
EG3D, one of the most widely-used 3D GAN models. We leverage the inherent
properties of EG3D's latent space to design a discriminator and a background
depth regularization. This enables us to train a geometry-aware encoder capable
of converting the input image into corresponding latent code. Additionally, we
explore the feature space of EG3D and develop an adaptive refinement stage that
improves the representation ability of features in EG3D to enhance the recovery
of fine-grained textural details. Finally, we propose an occlusion-aware fusion
operation to prevent distortion in unobserved regions. Our method achieves
impressive results comparable to optimization-based methods while operating up
to 500 times faster. Our framework is well-suited for applications such as
semantic editing.
Related papers
- GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation [65.33726478659304]
We introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory.
Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images.
GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms.
arXiv Detail & Related papers (2024-06-21T17:49:31Z) - DiffTF++: 3D-aware Diffusion Transformer for Large-Vocabulary 3D Generation [53.20147419879056]
We introduce a diffusion-based feed-forward framework to address challenges with a single model.
Building upon our 3D-aware Diffusion model with TransFormer, we propose a stronger version for 3D generation, i.e., DiffTF++.
Experiments on ShapeNet and OmniObject3D convincingly demonstrate the effectiveness of our proposed modules.
arXiv Detail & Related papers (2024-05-13T17:59:51Z) - LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation [73.36690511083894]
This paper introduces a novel framework called LN3Diff to address a unified 3D diffusion pipeline.
Our approach harnesses a 3D-aware architecture and variational autoencoder to encode the input image into a structured, compact, and 3D latent space.
It achieves state-of-the-art performance on ShapeNet for 3D generation and demonstrates superior performance in monocular 3D reconstruction and conditional 3D generation.
arXiv Detail & Related papers (2024-03-18T17:54:34Z) - Self-supervised Learning for Enhancing Geometrical Modeling in 3D-Aware
Generative Adversarial Network [42.16520614686877]
3D-GANs exhibit artifacts in their 3D geometrical modeling, such as mesh imperfections and holes.
These shortcomings are primarily attributed to the limited availability of annotated 3D data.
We present a Self-Supervised Learning technique tailored as an auxiliary loss for any 3D-GAN.
arXiv Detail & Related papers (2023-12-19T04:55:33Z) - TriPlaneNet: An Encoder for EG3D Inversion [1.9567015559455132]
NeRF-based GANs have introduced a number of approaches for high-resolution and high-fidelity generative modeling of human heads.
Despite the success of universal optimization-based methods for 2D GAN inversion, those applied to 3D GANs may fail to extrapolate the result onto the novel view.
We introduce a fast technique that bridges the gap between the two approaches by directly utilizing the tri-plane representation presented for the EG3D generative model.
arXiv Detail & Related papers (2023-03-23T17:56:20Z) - Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion [115.82306502822412]
StyleGAN has achieved great progress in 2D face reconstruction and semantic editing via image inversion and latent editing.
A corresponding generic 3D GAN inversion framework is still missing, limiting the applications of 3D face reconstruction and semantic editing.
We study the challenging problem of 3D GAN inversion where a latent code is predicted given a single face image to faithfully recover its 3D shapes and detailed textures.
arXiv Detail & Related papers (2022-12-14T18:49:50Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z) - 3D GAN Inversion with Pose Optimization [26.140281977885376]
We introduce a generalizable 3D GAN inversion method that infers camera viewpoint and latent code simultaneously to enable multi-view consistent semantic image editing.
We conduct extensive experiments on image reconstruction and editing both quantitatively and qualitatively, and further compare our results with 2D GAN-based editing.
arXiv Detail & Related papers (2022-10-13T19:06:58Z) - Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator [68.0533826852601]
3D-aware image synthesis aims at learning a generative model that can render photo-realistic 2D images while capturing decent underlying 3D shapes.
Existing methods fail to obtain moderate 3D shapes.
We propose a geometry-aware discriminator to improve 3D-aware GANs.
arXiv Detail & Related papers (2022-09-30T17:59:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.