Adaptable GAN Encoders for Image Reconstruction via Multi-type Latent
Vectors with Two-scale Attentions
- URL: http://arxiv.org/abs/2108.10201v1
- Date: Mon, 23 Aug 2021 14:37:58 GMT
- Title: Adaptable GAN Encoders for Image Reconstruction via Multi-type Latent
Vectors with Two-scale Attentions
- Authors: Cheng Yu, Wenmin Wang
- Abstract summary: We propose a novel method (named MTV-TSA) to handle such problems.
Creating multi-type latent vectors (MTV) from latent space and two-scale attentions (TSA) from images allows designing a set of encoders.
The designed encoders could make GANs reconstruct higher fidelity images from most synthesized HQ images.
- Score: 24.308432688431996
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Although current deep generative adversarial networks (GANs) could synthesize
high-quality (HQ) images, discovering novel GAN encoders for image
reconstruction is still favorable. When embedding images to latent space,
existing GAN encoders work well for aligned images (such as the human face),
but they do not adapt to more generalized GANs. To our knowledge, current
state-of-the-art GAN encoders do not have a proper encoder to reconstruct
high-fidelity images from most misaligned HQ synthesized images on different
GANs. Their performances are limited, especially on non-aligned and real
images. We propose a novel method (named MTV-TSA) to handle such problems.
Creating multi-type latent vectors (MTV) from latent space and two-scale
attentions (TSA) from images allows designing a set of encoders that can be
adaptable to a variety of pre-trained GANs. We generalize two sets of loss
functions to optimize the encoders. The designed encoders could make GANs
reconstruct higher fidelity images from most synthesized HQ images. In
addition, the proposed method can reconstruct real images well and process them
based on learned attribute directions. The designed encoders have unified
convolutional blocks and could match well in current GAN architectures (such as
PGGAN, StyleGANs, and BigGAN) by fine-tuning the corresponding normalization
layers and the last block. Such well-designed encoders can also be trained to
converge more quickly.
Related papers
- In-Domain GAN Inversion for Faithful Reconstruction and Editability [132.68255553099834]
We propose in-domain GAN inversion, which consists of a domain-guided domain-regularized and a encoder to regularize the inverted code in the native latent space of the pre-trained GAN model.
We make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property.
arXiv Detail & Related papers (2023-09-25T08:42:06Z) - Towards Accurate Image Coding: Improved Autoregressive Image Generation
with Dynamic Vector Quantization [73.52943587514386]
Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm.
We propose a novel two-stage framework: (1) Dynamic-Quantization VAE (DQ-VAE) which encodes image regions into variable-length codes based their information densities for accurate representation.
arXiv Detail & Related papers (2023-05-19T14:56:05Z) - JoIN: Joint GANs Inversion for Intrinsic Image Decomposition [16.02463667910604]
We propose to solve ill-posed inverse imaging problems using a bank of Generative Adversarial Networks (GAN)
Our method builds on the demonstrated success of GANs to capture complex image distributions.
arXiv Detail & Related papers (2023-05-18T22:09:32Z) - LD-GAN: Low-Dimensional Generative Adversarial Network for Spectral
Image Generation with Variance Regularization [72.4394510913927]
Deep learning methods are state-of-the-art for spectral image (SI) computational tasks.
GANs enable diverse augmentation by learning and sampling from the data distribution.
GAN-based SI generation is challenging since the high-dimensionality nature of this kind of data hinders the convergence of the GAN training yielding to suboptimal generation.
We propose a statistical regularization to control the low-dimensional representation variance for the autoencoder training and to achieve high diversity of samples generated with the GAN.
arXiv Detail & Related papers (2023-04-29T00:25:02Z) - TriPlaneNet: An Encoder for EG3D Inversion [1.9567015559455132]
NeRF-based GANs have introduced a number of approaches for high-resolution and high-fidelity generative modeling of human heads.
Despite the success of universal optimization-based methods for 2D GAN inversion, those applied to 3D GANs may fail to extrapolate the result onto the novel view.
We introduce a fast technique that bridges the gap between the two approaches by directly utilizing the tri-plane representation presented for the EG3D generative model.
arXiv Detail & Related papers (2023-03-23T17:56:20Z) - 3D-Aware Encoding for Style-based Neural Radiance Fields [50.118687869198716]
We learn an inversion function to project an input image to the latent space of a NeRF generator and then synthesize novel views of the original image based on the latent code.
Compared with GAN inversion for 2D generative models, NeRF inversion not only needs to 1) preserve the identity of the input image, but also 2) ensure 3D consistency in generated novel views.
We propose a two-stage encoder for style-based NeRF inversion.
arXiv Detail & Related papers (2022-11-12T06:14:12Z) - Feature-Style Encoder for Style-Based GAN Inversion [1.9116784879310027]
We propose a novel architecture for GAN inversion, which we call Feature-Style encoder.
Our model achieves accurate inversion of real images from the latent space of a pre-trained style-based GAN model.
Thanks to its encoder structure, the model allows fast and accurate image editing.
arXiv Detail & Related papers (2022-02-04T15:19:34Z) - Less is More: Pre-training a Strong Siamese Encoder Using a Weak Decoder [75.84152924972462]
Many real-world applications use Siamese networks to efficiently match text sequences at scale.
This paper pre-trains language models dedicated to sequence matching in Siamese architectures.
arXiv Detail & Related papers (2021-02-18T08:08:17Z) - GAN Inversion: A Survey [125.62848237531945]
GAN inversion aims to invert a given image back into the latent space of a pretrained GAN model.
GAN inversion plays an essential role in enabling the pretrained GAN models such as StyleGAN and BigGAN to be used for real image editing applications.
arXiv Detail & Related papers (2021-01-14T14:11:00Z) - Guiding GANs: How to control non-conditional pre-trained GANs for
conditional image generation [69.10717733870575]
We present a novel method for guiding generic non-conditional GANs to behave as conditional GANs.
Our approach adds into the mix an encoder network to generate the high-dimensional random input that are fed to the generator network of a non-conditional GAN.
arXiv Detail & Related papers (2021-01-04T14:03:32Z) - GLEAN: Generative Latent Bank for Large-Factor Image Super-Resolution [85.53811497840725]
We show that Generative Adversarial Networks (GANs), e.g., StyleGAN, can be used as a latent bank to improve the restoration quality of large-factor image super-resolution (SR)
Our method, Generative LatEnt bANk (GLEAN), goes beyond existing practices by directly leveraging rich and diverse priors encapsulated in a pre-trained GAN.
Images upscaled by GLEAN show clear improvements in terms of fidelity and texture faithfulness in comparison to existing methods.
arXiv Detail & Related papers (2020-12-01T18:56:14Z) - Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation [42.62624182740679]
We present a generic image-to-image translation framework, pixel2style2pixel (pSp)
Our pSp framework is based on a novel encoder network that directly generates a series of style vectors which are fed into a pretrained StyleGAN generator.
arXiv Detail & Related papers (2020-08-03T15:30:38Z) - Adversarial Latent Autoencoders [7.928094304325116]
We introduce an autoencoder that tackles issues jointly, which we call Adversarial Latent Autoencoder (ALAE)
ALAE is the first autoencoder able to compare with, and go beyond the capabilities of a generator-only type of architecture.
arXiv Detail & Related papers (2020-04-09T10:33:44Z) - In-Domain GAN Inversion for Real Image Editing [56.924323432048304]
A common practice of feeding a real image to a trained GAN generator is to invert it back to a latent code.
Existing inversion methods typically focus on reconstructing the target image by pixel values yet fail to land the inverted code in the semantic domain of the original latent space.
We propose an in-domain GAN inversion approach, which faithfully reconstructs the input image and ensures the inverted code to be semantically meaningful for editing.
arXiv Detail & Related papers (2020-03-31T18:20:18Z) - Reducing the Representation Error of GAN Image Priors Using the Deep
Decoder [29.12824512060469]
We show a method for reducing the representation error of GAN priors by modeling images as the linear combination of a GAN prior and a Deep Decoder.
For compressive sensing and image superresolution, our hybrid model exhibits consistently higher PSNRs than both the GAN priors and Deep Decoder separately.
arXiv Detail & Related papers (2020-01-23T18:37:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.