TriPlaneNet: An Encoder for EG3D Inversion
- URL: http://arxiv.org/abs/2303.13497v2
- Date: Wed, 25 Oct 2023 19:22:30 GMT
- Title: TriPlaneNet: An Encoder for EG3D Inversion
- Authors: Ananta R. Bhattarai, Matthias Nie{\ss}ner, Artem Sevastopolsky
- Abstract summary: NeRF-based GANs have introduced a number of approaches for high-resolution and high-fidelity generative modeling of human heads.
Despite the success of universal optimization-based methods for 2D GAN inversion, those applied to 3D GANs may fail to extrapolate the result onto the novel view.
We introduce a fast technique that bridges the gap between the two approaches by directly utilizing the tri-plane representation presented for the EG3D generative model.
- Score: 1.9567015559455132
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent progress in NeRF-based GANs has introduced a number of approaches for
high-resolution and high-fidelity generative modeling of human heads with a
possibility for novel view rendering. At the same time, one must solve an
inverse problem to be able to re-render or modify an existing image or video.
Despite the success of universal optimization-based methods for 2D GAN
inversion, those applied to 3D GANs may fail to extrapolate the result onto the
novel view, whereas optimization-based 3D GAN inversion methods are
time-consuming and can require at least several minutes per image. Fast
encoder-based techniques, such as those developed for StyleGAN, may also be
less appealing due to the lack of identity preservation. Our work introduces a
fast technique that bridges the gap between the two approaches by directly
utilizing the tri-plane representation presented for the EG3D generative model.
In particular, we build upon a feed-forward convolutional encoder for the
latent code and extend it with a fully-convolutional predictor of tri-plane
numerical offsets. The renderings are similar in quality to the ones produced
by optimization-based techniques and outperform the ones by encoder-based
methods. As we empirically prove, this is a consequence of directly operating
in the tri-plane space, not in the GAN parameter space, while making use of an
encoder-based trainable approach. Finally, we demonstrate significantly more
correct embedding of a face image in 3D than for all the baselines, further
strengthened by a probably symmetric prior enabled during training.
Related papers
- GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views [67.34073368933814]
We propose a generalizable Gaussian Splatting approach for high-resolution image rendering under a sparse-view camera setting.
We train our Gaussian parameter regression module on human-only data or human-scene data, jointly with a depth estimation module to lift 2D parameter maps to 3D space.
Experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.
arXiv Detail & Related papers (2024-11-18T08:18:44Z) - LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation [73.36690511083894]
This paper introduces a novel framework called LN3Diff to address a unified 3D diffusion pipeline.
Our approach harnesses a 3D-aware architecture and variational autoencoder to encode the input image into a structured, compact, and 3D latent space.
It achieves state-of-the-art performance on ShapeNet for 3D generation and demonstrates superior performance in monocular 3D reconstruction and conditional 3D generation.
arXiv Detail & Related papers (2024-03-18T17:54:34Z) - GaussianPro: 3D Gaussian Splatting with Progressive Propagation [49.918797726059545]
3DGS relies heavily on the point cloud produced by Structure-from-Motion (SfM) techniques.
We propose a novel method that applies a progressive propagation strategy to guide the densification of the 3D Gaussians.
Our method significantly surpasses 3DGS on the dataset, exhibiting an improvement of 1.15dB in terms of PSNR.
arXiv Detail & Related papers (2024-02-22T16:00:20Z) - Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D
Reconstruction with Transformers [37.14235383028582]
We introduce a novel approach for single-view reconstruction that efficiently generates a 3D model from a single image via feed-forward inference.
Our method utilizes two transformer-based networks, namely a point decoder and a triplane decoder, to reconstruct 3D objects using a hybrid Triplane-Gaussian intermediate representation.
arXiv Detail & Related papers (2023-12-14T17:18:34Z) - NeRF-GAN Distillation for Efficient 3D-Aware Generation with
Convolutions [97.27105725738016]
integration of Neural Radiance Fields (NeRFs) and generative models, such as Generative Adversarial Networks (GANs) has transformed 3D-aware generation from single-view images.
We propose a simple and effective method, based on re-using the well-disentangled latent space of a pre-trained NeRF-GAN in a pose-conditioned convolutional network to directly generate 3D-consistent images corresponding to the underlying 3D representations.
arXiv Detail & Related papers (2023-03-22T18:59:48Z) - Make Encoder Great Again in 3D GAN Inversion through Geometry and
Occlusion-Aware Encoding [25.86312557482366]
3D GAN inversion aims to achieve high reconstruction fidelity and reasonable 3D geometry simultaneously from a single image input.
We introduce a novel encoder-based inversion framework based on EG3D, one of the most widely-used 3D GAN models.
Our method achieves impressive results comparable to optimization-based methods while operating up to 500 times faster.
arXiv Detail & Related papers (2023-03-22T05:51:53Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z) - 3D GAN Inversion with Pose Optimization [26.140281977885376]
We introduce a generalizable 3D GAN inversion method that infers camera viewpoint and latent code simultaneously to enable multi-view consistent semantic image editing.
We conduct extensive experiments on image reconstruction and editing both quantitatively and qualitatively, and further compare our results with 2D GAN-based editing.
arXiv Detail & Related papers (2022-10-13T19:06:58Z) - Learned Vertex Descent: A New Direction for 3D Human Model Fitting [64.04726230507258]
We propose a novel optimization-based paradigm for 3D human model fitting on images and scans.
Our approach is able to capture the underlying body of clothed people with very different body shapes, achieving a significant improvement compared to state-of-the-art.
LVD is also applicable to 3D model fitting of humans and hands, for which we show a significant improvement to the SOTA with a much simpler and faster method.
arXiv Detail & Related papers (2022-05-12T17:55:51Z) - H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction [27.66008315400462]
Recent learning approaches that implicitly represent surface geometry have shown impressive results in the problem of multi-view 3D reconstruction.
We tackle these limitations for the specific problem of few-shot full 3D head reconstruction.
We learn a shape model of 3D heads from thousands of incomplete raw scans using implicit representations.
arXiv Detail & Related papers (2021-07-26T23:04:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.