Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation
- URL: http://arxiv.org/abs/2008.00951v2
- Date: Wed, 21 Apr 2021 12:53:36 GMT
- Title: Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation
- Authors: Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar,
Stav Shapiro, Daniel Cohen-Or
- Abstract summary: We present a generic image-to-image translation framework, pixel2style2pixel (pSp)
Our pSp framework is based on a novel encoder network that directly generates a series of style vectors which are fed into a pretrained StyleGAN generator.
- Score: 42.62624182740679
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a generic image-to-image translation framework, pixel2style2pixel
(pSp). Our pSp framework is based on a novel encoder network that directly
generates a series of style vectors which are fed into a pretrained StyleGAN
generator, forming the extended W+ latent space. We first show that our encoder
can directly embed real images into W+, with no additional optimization. Next,
we propose utilizing our encoder to directly solve image-to-image translation
tasks, defining them as encoding problems from some input domain into the
latent domain. By deviating from the standard invert first, edit later
methodology used with previous StyleGAN encoders, our approach can handle a
variety of tasks even when the input image is not represented in the StyleGAN
domain. We show that solving translation tasks through StyleGAN significantly
simplifies the training process, as no adversary is required, has better
support for solving tasks without pixel-to-pixel correspondence, and inherently
supports multi-modal synthesis via the resampling of styles. Finally, we
demonstrate the potential of our framework on a variety of facial
image-to-image translation tasks, even when compared to state-of-the-art
solutions designed specifically for a single task, and further show that it can
be extended beyond the human facial domain.
Related papers
- Masked and Adaptive Transformer for Exemplar Based Image Translation [16.93344592811513]
Cross-domain semantic matching is challenging.
We propose a masked and adaptive transformer (MAT) for learning accurate cross-domain correspondence.
We devise a novel contrastive style learning method, for acquire quality-discriminative style representations.
arXiv Detail & Related papers (2023-03-30T03:21:14Z) - FlexIT: Towards Flexible Semantic Image Translation [59.09398209706869]
We propose FlexIT, a novel method which can take any input image and a user-defined text instruction for editing.
First, FlexIT combines the input image and text into a single target point in the CLIP multimodal embedding space.
We iteratively transform the input image toward the target point, ensuring coherence and quality with a variety of novel regularization terms.
arXiv Detail & Related papers (2022-03-09T13:34:38Z) - ISF-GAN: An Implicit Style Function for High-Resolution Image-to-Image
Translation [55.47515538020578]
This work proposes an implicit style function (ISF) to straightforwardly achieve multi-modal and multi-domain image-to-image translation.
Our results in human face and animal manipulations show significantly improved results over the baselines.
Our model enables cost-effective multi-modal unsupervised image-to-image translations at high resolution using pre-trained unconditional GANs.
arXiv Detail & Related papers (2021-09-26T04:51:39Z) - StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis [68.3787368024951]
We propose a novel approach for multi-modal Image-to-image (I2I) translation.
We learn a latent embedding, jointly with the generator, that models the variability of the output domain.
Specifically, we pre-train a generic style encoder using a novel proxy task to learn an embedding of images, from arbitrary domains, into a low-dimensional style latent space.
arXiv Detail & Related papers (2021-04-14T19:58:24Z) - Unpaired Image-to-Image Translation via Latent Energy Transport [61.62293304236371]
Image-to-image translation aims to preserve source contents while translating to discriminative target styles between two visual domains.
In this paper, we propose to deploy an energy-based model (EBM) in the latent space of a pretrained autoencoder for this task.
Our model is the first to be applicable to 1024$times$1024-resolution unpaired image translation.
arXiv Detail & Related papers (2020-12-01T17:18:58Z) - In-Domain GAN Inversion for Real Image Editing [56.924323432048304]
A common practice of feeding a real image to a trained GAN generator is to invert it back to a latent code.
Existing inversion methods typically focus on reconstructing the target image by pixel values yet fail to land the inverted code in the semantic domain of the original latent space.
We propose an in-domain GAN inversion approach, which faithfully reconstructs the input image and ensures the inverted code to be semantically meaningful for editing.
arXiv Detail & Related papers (2020-03-31T18:20:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.