$S^2$-Flow: Joint Semantic and Style Editing of Facial Images
- URL: http://arxiv.org/abs/2211.12209v1
- Date: Tue, 22 Nov 2022 12:00:02 GMT
- Title: $S^2$-Flow: Joint Semantic and Style Editing of Facial Images
- Authors: Krishnakant Singh, Simone Schaub-Meyer, Stefan Roth
- Abstract summary: generative adversarial networks (GANs) have motivated investigations into their application for image editing.
GANs are often limited in the control they provide for performing specific edits.
We propose a method to disentangle a GAN$text'$s latent space into semantic and style spaces.
- Score: 16.47093005910139
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The high-quality images yielded by generative adversarial networks (GANs)
have motivated investigations into their application for image editing.
However, GANs are often limited in the control they provide for performing
specific edits. One of the principal challenges is the entangled latent space
of GANs, which is not directly suitable for performing independent and detailed
edits. Recent editing methods allow for either controlled style edits or
controlled semantic edits. In addition, methods that use semantic masks to edit
images have difficulty preserving the identity and are unable to perform
controlled style edits. We propose a method to disentangle a GAN$\text{'}$s
latent space into semantic and style spaces, enabling controlled semantic and
style edits for face images independently within the same framework. To achieve
this, we design an encoder-decoder based network architecture ($S^2$-Flow),
which incorporates two proposed inductive biases. We show the suitability of
$S^2$-Flow quantitatively and qualitatively by performing various semantic and
style edits.
Related papers
- An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control [21.624984690721842]
D-Edit is a framework to disentangle the comprehensive image-prompt interaction into several item-prompt interactions.
It is based on pretrained diffusion models with cross-attention layers disentangled and adopts a two-step optimization to build item-prompt associations.
We demonstrate state-of-the-art results in four types of editing operations including image-based, text-based, mask-based editing, and item removal.
arXiv Detail & Related papers (2024-03-07T20:06:29Z) - DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image
Editing [66.43179841884098]
Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years.
We propose DiffEditor to rectify two weaknesses in existing diffusion-based image editing.
Our method can efficiently achieve state-of-the-art performance on various fine-grained image editing tasks.
arXiv Detail & Related papers (2024-02-04T18:50:29Z) - ZONE: Zero-Shot Instruction-Guided Local Editing [56.56213730578504]
We propose a Zero-shot instructiON-guided local image Editing approach, termed ZONE.
We first convert the editing intent from the user-provided instruction into specific image editing regions through InstructPix2Pix.
We then propose a Region-IoU scheme for precise image layer extraction from an off-the-shelf segment model.
arXiv Detail & Related papers (2023-12-28T02:54:34Z) - Warping the Residuals for Image Editing with StyleGAN [5.733811543584874]
StyleGAN models show editing capabilities via their semantically interpretable latent organizations.
Many works have been proposed for inverting images into StyleGAN's latent space.
We present a novel image inversion architecture that extracts high-rate latent features and includes a flow estimation module.
arXiv Detail & Related papers (2023-12-18T18:24:18Z) - Optimisation-Based Multi-Modal Semantic Image Editing [58.496064583110694]
We propose an inference-time editing optimisation to accommodate multiple editing instruction types.
By allowing to adjust the influence of each loss function, we build a flexible editing solution that can be adjusted to user preferences.
We evaluate our method using text, pose and scribble edit conditions, and highlight our ability to achieve complex edits.
arXiv Detail & Related papers (2023-11-28T15:31:11Z) - Make It So: Steering StyleGAN for Any Image Inversion and Editing [16.337519991964367]
StyleGAN's disentangled style representation enables powerful image editing by manipulating the latent variables.
Existing GAN inversion methods struggle to maintain editing directions and produce realistic results.
We propose Make It So, a novel GAN inversion method that operates in the $mathcalZ$ (noise) space rather than the typical $mathcalW$ (latent style) space.
arXiv Detail & Related papers (2023-04-27T17:59:24Z) - StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing [86.92711729969488]
We exploit the amazing capacities of pretrained diffusion models for the editing of images.
They either finetune the model, or invert the image in the latent space of the pretrained model.
They suffer from two problems: Unsatisfying results for selected regions, and unexpected changes in nonselected regions.
arXiv Detail & Related papers (2023-03-28T00:16:45Z) - EditGAN: High-Precision Semantic Image Editing [120.49401527771067]
EditGAN is a novel method for high quality, high precision semantic image editing.
We show that EditGAN can manipulate images with an unprecedented level of detail and freedom.
We can also easily combine multiple edits and perform plausible edits beyond EditGAN training data.
arXiv Detail & Related papers (2021-11-04T22:36:33Z) - PIE: Portrait Image Embedding for Semantic Control [82.69061225574774]
We present the first approach for embedding real portrait images in the latent space of StyleGAN.
We use StyleRig, a pretrained neural network that maps the control space of a 3D morphable face model to the latent space of the GAN.
An identity energy preservation term allows spatially coherent edits while maintaining facial integrity.
arXiv Detail & Related papers (2020-09-20T17:53:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.