Swapping Autoencoder for Deep Image Manipulation
- URL: http://arxiv.org/abs/2007.00653v2
- Date: Mon, 14 Dec 2020 09:41:33 GMT
- Title: Swapping Autoencoder for Deep Image Manipulation
- Authors: Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman,
Alexei A. Efros, Richard Zhang
- Abstract summary: We propose the Swapping Autoencoder, a deep model designed specifically for image manipulation.
The key idea is to encode an image with two independent components and enforce that any swapped combination maps to a realistic image.
Experiments on multiple datasets show that our model produces better results and is substantially more efficient compared to recent generative models.
- Score: 94.33114146172606
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep generative models have become increasingly effective at producing
realistic images from randomly sampled seeds, but using such models for
controllable manipulation of existing images remains challenging. We propose
the Swapping Autoencoder, a deep model designed specifically for image
manipulation, rather than random sampling. The key idea is to encode an image
with two independent components and enforce that any swapped combination maps
to a realistic image. In particular, we encourage the components to represent
structure and texture, by enforcing one component to encode co-occurrent patch
statistics across different parts of an image. As our method is trained with an
encoder, finding the latent codes for a new input image becomes trivial, rather
than cumbersome. As a result, it can be used to manipulate real input images in
various ways, including texture swapping, local and global editing, and latent
code vector arithmetic. Experiments on multiple datasets show that our model
produces better results and is substantially more efficient compared to recent
generative models.
Related papers
- Closed-Loop Transcription via Convolutional Sparse Coding [29.75613581643052]
Autoencoders often use generic deep networks as the encoder or decoder, which are difficult to interpret.
In this work, we make the explicit assumption that the image distribution is generated from a multistage convolution sparse coding (CSC)
Our method enjoys several side benefits, including more structured and interpretable representations, more stable convergence, and scalability to large datasets.
arXiv Detail & Related papers (2023-02-18T14:40:07Z) - ClipCrop: Conditioned Cropping Driven by Vision-Language Model [90.95403416150724]
We take advantage of vision-language models as a foundation for creating robust and user-intentional cropping algorithms.
We develop a method to perform cropping with a text or image query that reflects the user's intention as guidance.
Our pipeline design allows the model to learn text-conditioned aesthetic cropping with a small dataset.
arXiv Detail & Related papers (2022-11-21T14:27:07Z) - SISL:Self-Supervised Image Signature Learning for Splicing Detection and
Localization [11.437760125881049]
We propose self-supervised approach for training splicing detection/localization models from frequency transforms of images.
Our proposed model can yield similar or better performances on standard datasets without relying on labels or metadata.
arXiv Detail & Related papers (2022-03-15T12:26:29Z) - EdiBERT, a generative model for image editing [12.605607949417033]
EdiBERT is a bi-directional transformer trained in the discrete latent space built by a vector-quantized auto-encoder.
We show that the resulting model matches state-of-the-art performances on a wide variety of tasks.
arXiv Detail & Related papers (2021-11-30T10:23:06Z) - StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for
Real-time Image Editing [19.495153059077367]
Generative adversarial networks (GANs) synthesize realistic images from random latent vectors.
Editing real images with GANs suffers from i) time-consuming optimization for projecting real images to the latent vectors, ii) or inaccurate embedding through an encoder.
We propose StyleMapGAN: the intermediate latent space has spatial dimensions, and a spatially variant replaces AdaIN.
arXiv Detail & Related papers (2021-04-30T04:43:24Z) - Ensembling with Deep Generative Views [72.70801582346344]
generative models can synthesize "views" of artificial images that mimic real-world variations, such as changes in color or pose.
Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification.
We use StyleGAN2 as the source of generative augmentations and investigate this setup on classification tasks involving facial attributes, cat faces, and cars.
arXiv Detail & Related papers (2021-04-29T17:58:35Z) - Free-Form Image Inpainting via Contrastive Attention Network [64.05544199212831]
In image inpainting tasks, masks with any shapes can appear anywhere in images which form complex patterns.
It is difficult for encoders to capture such powerful representations under this complex situation.
We propose a self-supervised Siamese inference network to improve the robustness and generalization.
arXiv Detail & Related papers (2020-10-29T14:46:05Z) - Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image
Decomposition [67.9464567157846]
We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties.
Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-06-29T12:53:58Z) - Locally Masked Convolution for Autoregressive Models [107.4635841204146]
LMConv is a simple modification to the standard 2D convolution that allows arbitrary masks to be applied to the weights at each location in the image.
We learn an ensemble of distribution estimators that share parameters but differ in generation order, achieving improved performance on whole-image density estimation.
arXiv Detail & Related papers (2020-06-22T17:59:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.