Related papers: Meta-Auxiliary Network for 3D GAN Inversion

Meta-Auxiliary Network for 3D GAN Inversion

URL: http://arxiv.org/abs/2305.10884v1
Date: Thu, 18 May 2023 11:26:27 GMT
Title: Meta-Auxiliary Network for 3D GAN Inversion
Authors: Bangrui Jiang, Zhenhua Guo, Yujiu Yang
Abstract summary: In this work, we present a novel meta-auxiliary framework, while leveraging the newly developed 3D GANs as generator. In the first stage, we invert the input image to an editable latent code using off-the-shelf inversion techniques. The auxiliary network is proposed to refine the generator parameters with the given image as input, which both predicts offsets for weights of convolutional layers and sampling positions of volume rendering. In the second stage, we perform meta-learning to fast adapt the auxiliary network to the input image, then the final reconstructed image is synthesized via the meta-learned auxiliary network.
Score: 18.777352198191004
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Real-world image manipulation has achieved fantastic progress in recent years. GAN inversion, which aims to map the real image to the latent code faithfully, is the first step in this pipeline. However, existing GAN inversion methods fail to achieve high reconstruction quality and fast inference at the same time. In addition, existing methods are built on 2D GANs and lack explicitly mechanisms to enforce multi-view consistency.In this work, we present a novel meta-auxiliary framework, while leveraging the newly developed 3D GANs as generator. The proposed method adopts a two-stage strategy. In the first stage, we invert the input image to an editable latent code using off-the-shelf inversion techniques. The auxiliary network is proposed to refine the generator parameters with the given image as input, which both predicts offsets for weights of convolutional layers and sampling positions of volume rendering. In the second stage, we perform meta-learning to fast adapt the auxiliary network to the input image, then the final reconstructed image is synthesized via the meta-learned auxiliary network. Extensive experiments show that our method achieves better performances on both inversion and editing tasks.

Related papers

Double-Shot 3D Shape Measurement with a Dual-Branch Network [14.749887303860717]
We propose a dual-branch Convolutional Neural Network (CNN)-Transformer network (PDCNet) to process different structured light (SL) modalities. Within PDCNet, a Transformer branch is used to capture global perception in the fringe images, while a CNN branch is designed to collect local details in the speckle images. We show that our method can reduce fringe order ambiguity while producing high-accuracy results on a self-made dataset.
arXiv Detail & Related papers (2024-07-19T10:49:26Z)
Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view. Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks. Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z)
Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers [37.14235383028582]
We introduce a novel approach for single-view reconstruction that efficiently generates a 3D model from a single image via feed-forward inference. Our method utilizes two transformer-based networks, namely a point decoder and a triplane decoder, to reconstruct 3D objects using a hybrid Triplane-Gaussian intermediate representation.
arXiv Detail & Related papers (2023-12-14T17:18:34Z)
In-Domain GAN Inversion for Faithful Reconstruction and Editability [132.68255553099834]
We propose in-domain GAN inversion, which consists of a domain-guided domain-regularized and a encoder to regularize the inverted code in the native latent space of the pre-trained GAN model. We make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property.
arXiv Detail & Related papers (2023-09-25T08:42:06Z)
TriPlaneNet: An Encoder for EG3D Inversion [1.9567015559455132]
NeRF-based GANs have introduced a number of approaches for high-resolution and high-fidelity generative modeling of human heads. Despite the success of universal optimization-based methods for 2D GAN inversion, those applied to 3D GANs may fail to extrapolate the result onto the novel view. We introduce a fast technique that bridges the gap between the two approaches by directly utilizing the tri-plane representation presented for the EG3D generative model.
arXiv Detail & Related papers (2023-03-23T17:56:20Z)
StraIT: Non-autoregressive Generation with Stratified Image Transformer [63.158996766036736]
Stratified Image Transformer(StraIT) is a pure non-autoregressive(NAR) generative model. Our experiments demonstrate that StraIT significantly improves NAR generation and out-performs existing DMs and AR methods.
arXiv Detail & Related papers (2023-03-01T18:59:33Z)
High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views. Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z)
3D-Aware Encoding for Style-based Neural Radiance Fields [50.118687869198716]
We learn an inversion function to project an input image to the latent space of a NeRF generator and then synthesize novel views of the original image based on the latent code. Compared with GAN inversion for 2D generative models, NeRF inversion not only needs to 1) preserve the identity of the input image, but also 2) ensure 3D consistency in generated novel views. We propose a two-stage encoder for style-based NeRF inversion.
arXiv Detail & Related papers (2022-11-12T06:14:12Z)
Deformably-Scaled Transposed Convolution [17.4596321623511]
We revisit transposed convolution and introduce a novel layer that allows us to place information in the image selectively. Our novel layer can be used as a drop-in replacement for 2D and 3D upsampling operators and the code will be publicly available.
arXiv Detail & Related papers (2022-10-17T21:35:29Z)
Multi-initialization Optimization Network for Accurate 3D Human Pose and Shape Estimation [75.44912541912252]
We propose a three-stage framework named Multi-Initialization Optimization Network (MION) In the first stage, we strategically select different coarse 3D reconstruction candidates which are compatible with the 2D keypoints of input sample. In the second stage, we design a mesh refinement transformer (MRT) to respectively refine each coarse reconstruction result via a self-attention mechanism. Finally, a Consistency Estimation Network (CEN) is proposed to find the best result from mutiple candidates by evaluating if the visual evidence in RGB image matches a given 3D reconstruction.
arXiv Detail & Related papers (2021-12-24T02:43:58Z)
HyperInverter: Improving StyleGAN Inversion via Hypernetwork [12.173568611144628]
Current GAN inversion methods fail to meet at least one of the three requirements listed below: high reconstruction quality, editability, and fast inference. We present a novel two-phase strategy in this research that fits all requirements at the same time. Our method is entirely encoder-based, resulting in extremely fast inference.
arXiv Detail & Related papers (2021-12-01T18:56:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.