Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space
Viewpoint
- URL: http://arxiv.org/abs/2211.11448v3
- Date: Sun, 26 Mar 2023 18:25:15 GMT
- Title: Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space
Viewpoint
- Authors: Hongyu Liu and Yibing Song and Qifeng Chen
- Abstract summary: GAN inversion and editing via StyleGAN maps an input image into the embedding spaces ($mathcalW$, $mathcalW+$, and $mathcalF$) to simultaneously maintain image fidelity and meaningful manipulation.
Recent GAN inversion methods typically explore $mathcalW+$ and $mathcalF$ rather than $mathcalW$ to improve reconstruction fidelity while maintaining editability.
We introduce contrastive learning to align $mathcalW$ and the image space for precise latent
- Score: 76.00222741383375
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: GAN inversion and editing via StyleGAN maps an input image into the embedding
spaces ($\mathcal{W}$, $\mathcal{W^+}$, and $\mathcal{F}$) to simultaneously
maintain image fidelity and meaningful manipulation. From latent space
$\mathcal{W}$ to extended latent space $\mathcal{W^+}$ to feature space
$\mathcal{F}$ in StyleGAN, the editability of GAN inversion decreases while its
reconstruction quality increases. Recent GAN inversion methods typically
explore $\mathcal{W^+}$ and $\mathcal{F}$ rather than $\mathcal{W}$ to improve
reconstruction fidelity while maintaining editability. As $\mathcal{W^+}$ and
$\mathcal{F}$ are derived from $\mathcal{W}$ that is essentially the foundation
latent space of StyleGAN, these GAN inversion methods focusing on
$\mathcal{W^+}$ and $\mathcal{F}$ spaces could be improved by stepping back to
$\mathcal{W}$. In this work, we propose to first obtain the precise latent code
in foundation latent space $\mathcal{W}$. We introduce contrastive learning to
align $\mathcal{W}$ and the image space for precise latent code discovery. %The
obtaining process is by using contrastive learning to align $\mathcal{W}$ and
the image space. Then, we leverage a cross-attention encoder to transform the
obtained latent code in $\mathcal{W}$ into $\mathcal{W^+}$ and $\mathcal{F}$,
accordingly. Our experiments show that our exploration of the foundation latent
space $\mathcal{W}$ improves the representation ability of latent codes in
$\mathcal{W^+}$ and features in $\mathcal{F}$, which yields state-of-the-art
reconstruction fidelity and editability results on the standard benchmarks.
Project page: https://kumapowerliu.github.io/CLCAE.
Related papers
- Partially Unitary Learning [0.0]
An optimal mapping between Hilbert spaces $IN$ of $left|psirightrangle$ and $OUT$ of $left|phirightrangle$ is presented.
An iterative algorithm for finding the global maximum of this optimization problem is developed.
arXiv Detail & Related papers (2024-05-16T17:13:55Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Revisiting Latent Space of GAN Inversion for Real Image Editing [27.035594402482886]
In this study, we revisit StyleGANs' hyperspherical prior $mathcalZ$ and combine it with highly capable latent spaces to build combined spaces that faithfully invert real images.
We show that $mathcalZ+$ can replace the most commonly-used $mathcalW$, $mathcalW+$, and $mathcalS$ spaces while preserving reconstruction quality, resulting in reduced distortion of edited images.
arXiv Detail & Related papers (2023-07-18T06:27:44Z) - Balancing Reconstruction and Editing Quality of GAN Inversion for Real
Image Editing with StyleGAN Prior Latent Space [27.035594402482886]
We revisit StyleGANs' hyperspherical prior $mathcalZ$ and $mathcalZ+$ and integrate them into seminal GAN inversion methods to improve editing quality.
Our extensions achieve sophisticated editing quality with the aid of the StyleGAN prior.
arXiv Detail & Related papers (2023-05-31T23:27:07Z) - On Machine Learning Knowledge Representation In The Form Of Partially
Unitary Operator. Knowledge Generalizing Operator [0.0]
A new form of ML knowledge representation with high generalization power is developed and implemented numerically.
$mathcalU$ can be considered as a $mathitIN$ to $mathitOUT$ quantum channel.
arXiv Detail & Related papers (2022-12-22T06:29:27Z) - On Optimal Learning Under Targeted Data Poisoning [48.907813854832206]
In this work we aim to characterize the smallest achievable error $epsilon=epsilon(eta)$ by the learner in the presence of such an adversary.
Remarkably, we show that the upper bound can be attained by a deterministic learner.
arXiv Detail & Related papers (2022-10-06T06:49:48Z) - SPAGHETTI: Editing Implicit Shapes Through Part Aware Generation [85.09014441196692]
We introduce a method for $mathbfE$diting $mathbfI$mplicit $mathbfS$hapes $mathbfT$hrough.
Our architecture allows for manipulation of implicit shapes by means of transforming, interpolating and combining shape segments together.
arXiv Detail & Related papers (2022-01-31T12:31:41Z) - On Submodular Contextual Bandits [92.45432756301231]
We consider the problem of contextual bandits where actions are subsets of a ground set and mean rewards are modeled by an unknown monotone submodular function.
We show that our algorithm efficiently randomizes around local optima of estimated functions according to the Inverse Gap Weighting strategy.
arXiv Detail & Related papers (2021-12-03T21:42:33Z) - Contextual Recommendations and Low-Regret Cutting-Plane Algorithms [49.91214213074933]
We consider the following variant of contextual linear bandits motivated by routing applications in navigational engines and recommendation systems.
We design novel cutting-plane algorithms with low "regret" -- the total distance between the true point $w*$ and the hyperplanes the separation oracle returns.
arXiv Detail & Related papers (2021-06-09T05:39:05Z) - Phase Transitions in Rate Distortion Theory and Deep Learning [5.145741425164946]
We say that $mathcalS$ can be compressed at rate $s$ if we can achieve an error of $mathcalO(R-s)$ for encoding $mathcalS$.
We show that for certain "nice" signal classes $mathcalS$, a phase transition occurs: We construct a probability measure $mathbbP$ on $mathcalS$.
arXiv Detail & Related papers (2020-08-03T16:48:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.