LSAP: Rethinking Inversion Fidelity, Perception and Editability in GAN
Latent Space
- URL: http://arxiv.org/abs/2209.12746v1
- Date: Mon, 26 Sep 2022 14:55:21 GMT
- Title: LSAP: Rethinking Inversion Fidelity, Perception and Editability in GAN
Latent Space
- Authors: Cao Pu, Lu Yang, Dongxv Liu, Zhiwei Liu, Wenguan Wang, Shan Li, Qing
Song
- Abstract summary: We introduce Normalized Style Space and $mathcalSN$ Cosine Distance to measure disalignment of inversion methods.
Our proposed SNCD is differentiable, it can be optimized in both encoder-based and optimization-based embedding methods to conduct a uniform solution.
- Score: 42.56147568941768
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As the methods evolve, inversion is mainly divided into two steps. The first
step is Image Embedding, in which an encoder or optimization process embeds
images to get the corresponding latent codes. Afterward, the second step aims
to refine the inversion and editing results, which we named Result Refinement.
Although the second step significantly improves fidelity, perception and
editability are almost unchanged, deeply dependent on inverse latent codes
attained in the first step. Therefore, a crucial problem is gaining the latent
codes with better perception and editability while retaining the reconstruction
fidelity. In this work, we first point out that these two characteristics are
related to the degree of alignment (or disalignment) of the inverse codes with
the synthetic distribution. Then, we propose Latent Space Alignment Inversion
Paradigm (LSAP), which consists of evaluation metric and solution for this
problem. Specifically, we introduce Normalized Style Space ($\mathcal{S^N}$
space) and $\mathcal{S^N}$ Cosine Distance (SNCD) to measure disalignment of
inversion methods. Since our proposed SNCD is differentiable, it can be
optimized in both encoder-based and optimization-based embedding methods to
conduct a uniform solution. Extensive experiments in various domains
demonstrate that SNCD effectively reflects perception and editability, and our
alignment paradigm archives the state-of-the-art in both two steps. Code is
available on https://github.com/caopulan/GANInverter.
Related papers
- Finding Quantum Codes via Riemannian Optimization [0.0]
We propose a novel optimization scheme designed to find optimally correctable subspace codes for a known quantum noise channel.
To each candidate subspace code we first associate a universal recovery map, as if code correctable, and aim to maximize performance.
The set of codes of fixed dimension is parametrized with a complex-valued Stiefel manifold.
arXiv Detail & Related papers (2024-07-11T12:03:41Z) - Equivariant Deep Weight Space Alignment [54.65847470115314]
We propose a novel framework aimed at learning to solve the weight alignment problem.
We first prove that weight alignment adheres to two fundamental symmetries and then, propose a deep architecture that respects these symmetries.
arXiv Detail & Related papers (2023-10-20T10:12:06Z) - Transformers as Support Vector Machines [54.642793677472724]
We establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem.
We characterize the implicit bias of 1-layer transformers optimized with gradient descent.
We believe these findings inspire the interpretation of transformers as a hierarchy of SVMs that separates and selects optimal tokens.
arXiv Detail & Related papers (2023-08-31T17:57:50Z) - DRSOM: A Dimension Reduced Second-Order Method [13.778619250890406]
Under a trust-like framework, our method preserves the convergence of the second-order method while using only information in a few directions.
Theoretically, we show that the method has a local convergence and a global convergence rate of $O(epsilon-3/2)$ to satisfy the first-order and second-order conditions.
arXiv Detail & Related papers (2022-07-30T13:05:01Z) - Cycle Encoding of a StyleGAN Encoder for Improved Reconstruction and
Editability [76.6724135757723]
GAN inversion aims to invert an input image into the latent space of a pre-trained GAN.
Despite the recent advances in GAN inversion, there remain challenges to mitigate the tradeoff between distortion and editability.
We propose a two-step approach that first inverts the input image into a latent code, called pivot code, and then alters the generator so that the input image can be accurately mapped into the pivot code.
arXiv Detail & Related papers (2022-07-19T16:10:16Z) - HyperInverter: Improving StyleGAN Inversion via Hypernetwork [12.173568611144628]
Current GAN inversion methods fail to meet at least one of the three requirements listed below: high reconstruction quality, editability, and fast inference.
We present a novel two-phase strategy in this research that fits all requirements at the same time.
Our method is entirely encoder-based, resulting in extremely fast inference.
arXiv Detail & Related papers (2021-12-01T18:56:05Z) - Boosting Continuous Sign Language Recognition via Cross Modality
Augmentation [135.30357113518127]
Continuous sign language recognition deals with unaligned video-text pair.
We propose a novel architecture with cross modality augmentation.
The proposed framework can be easily extended to other existing CTC based continuous SLR architectures.
arXiv Detail & Related papers (2020-10-11T15:07:50Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.