Style Intervention: How to Achieve Spatial Disentanglement with
Style-based Generators?
- URL: http://arxiv.org/abs/2011.09699v1
- Date: Thu, 19 Nov 2020 07:37:31 GMT
- Title: Style Intervention: How to Achieve Spatial Disentanglement with
Style-based Generators?
- Authors: Yunfan Liu, Qi Li, Zhenan Sun, Tieniu Tan
- Abstract summary: We propose a lightweight optimization-based algorithm which could adapt to arbitrary input images and render natural translation effects under flexible objectives.
We verify the performance of the proposed framework in facial attribute editing on high-resolution images, where both photo-realism and consistency are required.
- Score: 100.60938767993088
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Generative Adversarial Networks (GANs) with style-based generators (e.g.
StyleGAN) successfully enable semantic control over image synthesis, and recent
studies have also revealed that interpretable image translations could be
obtained by modifying the latent code. However, in terms of the low-level image
content, traveling in the latent space would lead to `spatially entangled
changes' in corresponding images, which is undesirable in many real-world
applications where local editing is required. To solve this problem, we analyze
properties of the 'style space' and explore the possibility of controlling the
local translation with pre-trained style-based generators. Concretely, we
propose 'Style Intervention', a lightweight optimization-based algorithm which
could adapt to arbitrary input images and render natural translation effects
under flexible objectives. We verify the performance of the proposed framework
in facial attribute editing on high-resolution images, where both photo-realism
and consistency are required. Extensive qualitative results demonstrate the
effectiveness of our method, and quantitative measurements also show that the
proposed algorithm outperforms state-of-the-art benchmarks in various aspects.
Related papers
- Coherent and Multi-modality Image Inpainting via Latent Space Optimization [61.99406669027195]
PILOT (intextbfPainting vtextbfIa textbfLatent textbfOptextbfTimization) is an optimization approach grounded on a novel textitsemantic centralization and textitbackground preservation loss.
Our method searches latent spaces capable of generating inpainted regions that exhibit high fidelity to user-provided prompts while maintaining coherence with the background.
arXiv Detail & Related papers (2024-07-10T19:58:04Z) - Spatially-Attentive Patch-Hierarchical Network with Adaptive Sampling
for Motion Deblurring [34.751361664891235]
We propose a pixel adaptive and feature attentive design for handling large blur variations across different spatial locations.
We show that our approach performs favorably against the state-of-the-art deblurring algorithms.
arXiv Detail & Related papers (2024-02-09T01:00:09Z) - Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis [60.260724486834164]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries.
We present two key innovations: Vision Guidance and the Layered Rendering Diffusion framework.
We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing.
arXiv Detail & Related papers (2023-11-30T10:36:19Z) - Conditional Score Guidance for Text-Driven Image-to-Image Translation [52.73564644268749]
We present a novel algorithm for text-driven image-to-image translation based on a pretrained text-to-image diffusion model.
Our method aims to generate a target image by selectively editing the regions of interest in a source image.
arXiv Detail & Related papers (2023-05-29T10:48:34Z) - Bridging CLIP and StyleGAN through Latent Alignment for Image Editing [33.86698044813281]
We bridge CLIP and StyleGAN to achieve inference-time optimization-free diverse manipulation direction mining.
With this mapping scheme, we can achieve GAN inversion, text-to-image generation and text-driven image manipulation.
arXiv Detail & Related papers (2022-10-10T09:17:35Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - Optimizing Latent Space Directions For GAN-based Local Image Editing [15.118159513841874]
We present a novel objective function to evaluate the locality of an image edit.
Our framework, called Locally Effective Latent Space Direction (LELSD), is applicable to any dataset and GAN architecture.
Our method is also computationally fast and exhibits a high extent of disentanglement, which allows users to interactively perform a sequence of edits on an image.
arXiv Detail & Related papers (2021-11-24T16:02:46Z) - Smoothing the Disentangled Latent Style Space for Unsupervised
Image-to-Image Translation [56.55178339375146]
Image-to-Image (I2I) multi-domain translation models are usually evaluated also using the quality of their semantic results.
We propose a new training protocol based on three specific losses which help a translation network to learn a smooth and disentangled latent style space.
arXiv Detail & Related papers (2021-06-16T17:58:21Z) - Content-Preserving Unpaired Translation from Simulated to Realistic
Ultrasound Images [12.136874314973689]
We introduce a novel image translation framework to bridge the appearance gap between simulated images and real scans.
We achieve this goal by leveraging both simulated images with semantic segmentations and unpaired in-vivo ultrasound scans.
arXiv Detail & Related papers (2021-03-09T22:35:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.