Related papers: Near Perfect GAN Inversion

Near Perfect GAN Inversion

URL: http://arxiv.org/abs/2202.11833v1
Date: Wed, 23 Feb 2022 23:58:13 GMT
Title: Near Perfect GAN Inversion
Authors: Qianli Feng, Viraj Shah, Raghudeep Gadde, Pietro Perona, Aleix Martinez
Abstract summary: We derive an algorithm that achieves near perfect reconstructions of photos. We show that this approach can not only produce synthetic images that are indistinguishable from the real photos we wish to replicate, but that these images are readily editable.
Score: 17.745342857726925
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To edit a real photo using Generative Adversarial Networks (GANs), we need a GAN inversion algorithm to identify the latent vector that perfectly reproduces it. Unfortunately, whereas existing inversion algorithms can synthesize images similar to real photos, they cannot generate the identical clones needed in most applications. Here, we derive an algorithm that achieves near perfect reconstructions of photos. Rather than relying on encoder- or optimization-based methods to find an inverse mapping on a fixed generator $G(\cdot)$, we derive an approach to locally adjust $G(\cdot)$ to more optimally represent the photos we wish to synthesize. This is done by locally tweaking the learned mapping $G(\cdot)$ s.t. $\| {\bf x} - G({\bf z}) \|<\epsilon$, with ${\bf x}$ the photo we wish to reproduce, ${\bf z}$ the latent vector, $\|\cdot\|$ an appropriate metric, and $\epsilon > 0$ a small scalar. We show that this approach can not only produce synthetic images that are indistinguishable from the real photos we wish to replicate, but that these images are readily editable. We demonstrate the effectiveness of the derived algorithm on a variety of datasets including human faces, animals, and cars, and discuss its importance for diversity and inclusion.

Related papers

VA-$π$: Variational Policy Alignment for Pixel-Aware Autoregressive Generation [65.22452273252428]
VA-$$ is a post-training framework to optimize autoregressive visual generation.<n>It unifies pixel reconstruction and autoregressive modeling.<n>It reduces FID from 14.36 to 7.65 and improves IS from 86.55 to 116.70 on LlamaGen-XXL.
arXiv Detail & Related papers (2025-12-22T18:54:30Z)
GPSToken: Gaussian Parameterized Spatially-adaptive Tokenization for Image Representation and Generation [19.94399008500357]
GPSToken is a novel $textbfG$aussian $textbfP$arameterized $textbfS$patially-adaptive $textbfToken$ization framework.<n>GPSToken disentangles spatial layout (Gaussian parameters) from texture features to enable efficient two-stage generation.
arXiv Detail & Related papers (2025-09-01T04:01:37Z)
GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting [64.84383010238908]
We propose an effective image tokenizer with 2D Gaussian Splatting as a solution. In general, our framework integrates the local influence of 2D Gaussian distribution into the discrete space. Competitive reconstruction performances on CIFAR, Mini-Net, and ImageNet-1K demonstrate the effectiveness of our framework.
arXiv Detail & Related papers (2025-01-26T17:56:11Z)
No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images. Our model achieves real-time 3D Gaussian reconstruction during inference. This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z)
RefineStyle: Dynamic Convolution Refinement for StyleGAN [15.230430037135017]
In StyleGAN, convolution kernels are shaped by both static parameters shared across images. $mathcalW+$ space is often used for image inversion and editing. This paper proposes an efficient refining strategy for dynamic kernels.
arXiv Detail & Related papers (2024-10-08T15:01:30Z)
In-Domain GAN Inversion for Faithful Reconstruction and Editability [132.68255553099834]
We propose in-domain GAN inversion, which consists of a domain-guided domain-regularized and a encoder to regularize the inverted code in the native latent space of the pre-trained GAN model. We make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property.
arXiv Detail & Related papers (2023-09-25T08:42:06Z)
Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis [3.222802562733787]
We propose a new generative model that is both efficient and scale-equivariant without using any spatial convolutions or coarse-to-fine design. Experiments on various datasets, including FFHQ, LSUN-Church, MetFaces, and Flickr-Scenery, confirm CREPS' ability to synthesize scale-consistent and alias-free images.
arXiv Detail & Related papers (2023-03-24T17:12:38Z)
Efficient Image Denoising by Low-Rank Singular Vector Approximations of Geodesics' Gramian Matrix [2.3499129784547654]
Noise contamination of images results in substandard expectations among the people. Image denoising is an essential pre-processing step. We present a manifold-based noise filtering method that mainly exploits a few prominent singular vectors of the geodesics' Gramian matrix.
arXiv Detail & Related papers (2022-09-27T01:03:36Z)
CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs [129.51129173514502]
We introduce Coordinate GAN (CoordGAN), a structure-texture disentangled GAN that learns a dense correspondence map for each generated image. We show that the proposed generator achieves better structure and texture disentanglement compared to existing approaches.
arXiv Detail & Related papers (2022-03-30T17:55:09Z)
DeDUCE: Generating Counterfactual Explanations Efficiently [26.300599540027893]
We develop a new algorithm providing counterfactual explanations for large image classifiers trained with spectral normalisation at low computational cost. We empirically compare this algorithm against baselines from the literature; our novel algorithm consistently finds counterfactuals that are much closer to the original inputs.
arXiv Detail & Related papers (2021-11-29T17:47:21Z)
Contextual Recommendations and Low-Regret Cutting-Plane Algorithms [49.91214213074933]
We consider the following variant of contextual linear bandits motivated by routing applications in navigational engines and recommendation systems. We design novel cutting-plane algorithms with low "regret" -- the total distance between the true point $w*$ and the hyperplanes the separation oracle returns.
arXiv Detail & Related papers (2021-06-09T05:39:05Z)
StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing [19.495153059077367]
Generative adversarial networks (GANs) synthesize realistic images from random latent vectors. Editing real images with GANs suffers from i) time-consuming optimization for projecting real images to the latent vectors, ii) or inaccurate embedding through an encoder. We propose StyleMapGAN: the intermediate latent space has spatial dimensions, and a spatially variant replaces AdaIN.
arXiv Detail & Related papers (2021-04-30T04:43:24Z)
Ensembling with Deep Generative Views [72.70801582346344]
generative models can synthesize "views" of artificial images that mimic real-world variations, such as changes in color or pose. Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification. We use StyleGAN2 as the source of generative augmentations and investigate this setup on classification tasks involving facial attributes, cat faces, and cars.
arXiv Detail & Related papers (2021-04-29T17:58:35Z)
Swapping Autoencoder for Deep Image Manipulation [94.33114146172606]
We propose the Swapping Autoencoder, a deep model designed specifically for image manipulation. The key idea is to encode an image with two independent components and enforce that any swapped combination maps to a realistic image. Experiments on multiple datasets show that our model produces better results and is substantially more efficient compared to recent generative models.
arXiv Detail & Related papers (2020-07-01T17:59:57Z)
In-Domain GAN Inversion for Real Image Editing [56.924323432048304]
A common practice of feeding a real image to a trained GAN generator is to invert it back to a latent code. Existing inversion methods typically focus on reconstructing the target image by pixel values yet fail to land the inverted code in the semantic domain of the original latent space. We propose an in-domain GAN inversion approach, which faithfully reconstructs the input image and ensures the inverted code to be semantically meaningful for editing.
arXiv Detail & Related papers (2020-03-31T18:20:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.