Meta-Auxiliary Network for 3D GAN Inversion
- URL: http://arxiv.org/abs/2305.10884v1
- Date: Thu, 18 May 2023 11:26:27 GMT
- Title: Meta-Auxiliary Network for 3D GAN Inversion
- Authors: Bangrui Jiang, Zhenhua Guo, Yujiu Yang
- Abstract summary: In this work, we present a novel meta-auxiliary framework, while leveraging the newly developed 3D GANs as generator.
In the first stage, we invert the input image to an editable latent code using off-the-shelf inversion techniques.
The auxiliary network is proposed to refine the generator parameters with the given image as input, which both predicts offsets for weights of convolutional layers and sampling positions of volume rendering.
In the second stage, we perform meta-learning to fast adapt the auxiliary network to the input image, then the final reconstructed image is synthesized via the meta-learned auxiliary network.
- Score: 18.777352198191004
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-world image manipulation has achieved fantastic progress in recent
years. GAN inversion, which aims to map the real image to the latent code
faithfully, is the first step in this pipeline. However, existing GAN inversion
methods fail to achieve high reconstruction quality and fast inference at the
same time. In addition, existing methods are built on 2D GANs and lack
explicitly mechanisms to enforce multi-view consistency.In this work, we
present a novel meta-auxiliary framework, while leveraging the newly developed
3D GANs as generator. The proposed method adopts a two-stage strategy. In the
first stage, we invert the input image to an editable latent code using
off-the-shelf inversion techniques. The auxiliary network is proposed to refine
the generator parameters with the given image as input, which both predicts
offsets for weights of convolutional layers and sampling positions of volume
rendering. In the second stage, we perform meta-learning to fast adapt the
auxiliary network to the input image, then the final reconstructed image is
synthesized via the meta-learned auxiliary network. Extensive experiments show
that our method achieves better performances on both inversion and editing
tasks.
Related papers
- Joint Semantic and Rendering Enhancements in 3D Gaussian Modeling with Anisotropic Local Encoding [86.55824709875598]
We propose a joint enhancement framework for 3D semantic Gaussian modeling that synergizes both semantic and rendering branches.<n>Unlike conventional point cloud shape encoding, we introduce an anisotropic 3D Gaussian Chebyshev descriptor to capture fine-grained 3D shape details.<n>We employ a cross-scene knowledge transfer module to continuously update learned shape patterns, enabling faster convergence and robust representations.
arXiv Detail & Related papers (2026-01-05T18:33:50Z) - WarpGAN: Warping-Guided 3D GAN Inversion with Style-Based Novel View Inpainting [68.77882703764142]
3D GAN inversion projects a single image into the latent space of a pre-trained 3D GAN to achieve single-shot novel view synthesis.<n>We introduce the warping-and-inpainting strategy to incorporate image inpainting into 3D GAN inversion and propose a novel 3D GAN inversion method, WarpGAN.
arXiv Detail & Related papers (2025-11-11T12:42:07Z) - High-resolution Photo Enhancement in Real-time: A Laplacian Pyramid Network [73.19214585791268]
This paper introduces a pyramid network called LLF-LUT++, which integrates global and local operators through closed-form Laplacian pyramid decomposition and reconstruction.<n>Specifically, we utilize an image-adaptive 3D LUT that capitalizes on the global tonal characteristics of downsampled images.<n>LLF-LUT++ not only achieves a 2.64 dB improvement in PSNR on the HDR+ dataset, but also further reduces, with 4K resolution images processed in just 13 ms on a single GPU.
arXiv Detail & Related papers (2025-10-13T16:52:32Z) - 2D Gaussian Splatting with Semantic Alignment for Image Inpainting [46.266955851252504]
We propose the first image inpainting framework based on 2D Gaussian Splatting.<n>For global semantic consistency, we incorporate features from a pretrained DINO model.<n>Our method achieves competitive performance in both quantitative metrics and perceptual quality.
arXiv Detail & Related papers (2025-09-02T05:12:52Z) - Double-Shot 3D Shape Measurement with a Dual-Branch Network [14.749887303860717]
We propose a dual-branch Convolutional Neural Network (CNN)-Transformer network (PDCNet) to process different structured light (SL) modalities.
Within PDCNet, a Transformer branch is used to capture global perception in the fringe images, while a CNN branch is designed to collect local details in the speckle images.
We show that our method can reduce fringe order ambiguity while producing high-accuracy results on a self-made dataset.
arXiv Detail & Related papers (2024-07-19T10:49:26Z) - Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z) - Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D
Reconstruction with Transformers [37.14235383028582]
We introduce a novel approach for single-view reconstruction that efficiently generates a 3D model from a single image via feed-forward inference.
Our method utilizes two transformer-based networks, namely a point decoder and a triplane decoder, to reconstruct 3D objects using a hybrid Triplane-Gaussian intermediate representation.
arXiv Detail & Related papers (2023-12-14T17:18:34Z) - In-Domain GAN Inversion for Faithful Reconstruction and Editability [132.68255553099834]
We propose in-domain GAN inversion, which consists of a domain-guided domain-regularized and a encoder to regularize the inverted code in the native latent space of the pre-trained GAN model.
We make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property.
arXiv Detail & Related papers (2023-09-25T08:42:06Z) - TriPlaneNet: An Encoder for EG3D Inversion [1.9567015559455132]
NeRF-based GANs have introduced a number of approaches for high-resolution and high-fidelity generative modeling of human heads.
Despite the success of universal optimization-based methods for 2D GAN inversion, those applied to 3D GANs may fail to extrapolate the result onto the novel view.
We introduce a fast technique that bridges the gap between the two approaches by directly utilizing the tri-plane representation presented for the EG3D generative model.
arXiv Detail & Related papers (2023-03-23T17:56:20Z) - StraIT: Non-autoregressive Generation with Stratified Image Transformer [63.158996766036736]
Stratified Image Transformer(StraIT) is a pure non-autoregressive(NAR) generative model.
Our experiments demonstrate that StraIT significantly improves NAR generation and out-performs existing DMs and AR methods.
arXiv Detail & Related papers (2023-03-01T18:59:33Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z) - 3D-Aware Encoding for Style-based Neural Radiance Fields [50.118687869198716]
We learn an inversion function to project an input image to the latent space of a NeRF generator and then synthesize novel views of the original image based on the latent code.
Compared with GAN inversion for 2D generative models, NeRF inversion not only needs to 1) preserve the identity of the input image, but also 2) ensure 3D consistency in generated novel views.
We propose a two-stage encoder for style-based NeRF inversion.
arXiv Detail & Related papers (2022-11-12T06:14:12Z) - Deformably-Scaled Transposed Convolution [17.4596321623511]
We revisit transposed convolution and introduce a novel layer that allows us to place information in the image selectively.
Our novel layer can be used as a drop-in replacement for 2D and 3D upsampling operators and the code will be publicly available.
arXiv Detail & Related papers (2022-10-17T21:35:29Z) - Multi-initialization Optimization Network for Accurate 3D Human Pose and
Shape Estimation [75.44912541912252]
We propose a three-stage framework named Multi-Initialization Optimization Network (MION)
In the first stage, we strategically select different coarse 3D reconstruction candidates which are compatible with the 2D keypoints of input sample.
In the second stage, we design a mesh refinement transformer (MRT) to respectively refine each coarse reconstruction result via a self-attention mechanism.
Finally, a Consistency Estimation Network (CEN) is proposed to find the best result from mutiple candidates by evaluating if the visual evidence in RGB image matches a given 3D reconstruction.
arXiv Detail & Related papers (2021-12-24T02:43:58Z) - HyperInverter: Improving StyleGAN Inversion via Hypernetwork [12.173568611144628]
Current GAN inversion methods fail to meet at least one of the three requirements listed below: high reconstruction quality, editability, and fast inference.
We present a novel two-phase strategy in this research that fits all requirements at the same time.
Our method is entirely encoder-based, resulting in extremely fast inference.
arXiv Detail & Related papers (2021-12-01T18:56:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.