HyperEditor: Achieving Both Authenticity and Cross-Domain Capability in
Image Editing via Hypernetworks
- URL: http://arxiv.org/abs/2312.13537v1
- Date: Thu, 21 Dec 2023 02:39:53 GMT
- Title: HyperEditor: Achieving Both Authenticity and Cross-Domain Capability in
Image Editing via Hypernetworks
- Authors: Hai Zhang, Chunwei Wu, Guitao Cao, Hailing Wang, Wenming Cao
- Abstract summary: We propose an innovative image editing method called HyperEditor, which utilizes weight factors generated by hypernetworks to reassign the weights of the pre-trained StyleGAN2's generator.
Guided by CLIP's cross-modal image-text semantic alignment, this innovative approach enables us to simultaneously accomplish authentic attribute editing and cross-domain style transfer.
- Score: 5.9189325968909365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Editing real images authentically while also achieving cross-domain editing
remains a challenge. Recent studies have focused on converting real images into
latent codes and accomplishing image editing by manipulating these codes.
However, merely manipulating the latent codes would constrain the edited images
to the generator's image domain, hindering the attainment of diverse editing
goals. In response, we propose an innovative image editing method called
HyperEditor, which utilizes weight factors generated by hypernetworks to
reassign the weights of the pre-trained StyleGAN2's generator. Guided by CLIP's
cross-modal image-text semantic alignment, this innovative approach enables us
to simultaneously accomplish authentic attribute editing and cross-domain style
transfer, a capability not realized in previous methods. Additionally, we
ascertain that modifying only the weights of specific layers in the generator
can yield an equivalent editing result. Therefore, we introduce an adaptive
layer selector, enabling our hypernetworks to autonomously identify the layers
requiring output weight factors, which can further improve our hypernetworks'
efficiency. Extensive experiments on abundant challenging datasets demonstrate
the effectiveness of our method.
Related papers
- Latent Space Editing in Transformer-Based Flow Matching [53.75073756305241]
Flow Matching with a transformer backbone offers the potential for scalable and high-quality generative modeling.
We introduce an editing space, $u$-space, that can be manipulated in a controllable, accumulative, and composable manner.
Lastly, we put forth a straightforward yet powerful method for achieving fine-grained and nuanced editing using text prompts.
arXiv Detail & Related papers (2023-12-17T21:49:59Z) - CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing [22.40686064568406]
We present CLIPInverter, a new text-driven image editing approach that is able to efficiently and reliably perform multi-attribute changes.
Our method outperforms competing approaches in terms of manipulation accuracy and photo-realism on various domains including human faces, cats, and birds.
arXiv Detail & Related papers (2023-07-17T11:29:48Z) - LayerDiffusion: Layered Controlled Image Editing with Diffusion Models [5.58892860792971]
LayerDiffusion is a semantic-based layered controlled image editing method.
We leverage a large-scale text-to-image model and employ a layered controlled optimization strategy.
Experimental results demonstrate the effectiveness of our method in generating highly coherent images.
arXiv Detail & Related papers (2023-05-30T01:26:41Z) - iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing.
It generates images conditioned on a source image and a textual edit prompt.
It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z) - Gradient Adjusting Networks for Domain Inversion [82.72289618025084]
StyleGAN2 was demonstrated to be a powerful image generation engine that supports semantic editing.
We present a per-image optimization method that tunes a StyleGAN2 generator such that it achieves a local edit to the generator's weights.
Our experiments show a sizable gap in performance over the current state of the art in this very active domain.
arXiv Detail & Related papers (2023-02-22T14:47:57Z) - DiffEdit: Diffusion-based semantic image editing with mask guidance [64.555930158319]
DiffEdit is a method to take advantage of text-conditioned diffusion models for the task of semantic image editing.
Our main contribution is able to automatically generate a mask highlighting regions of the input image that need to be edited.
arXiv Detail & Related papers (2022-10-20T17:16:37Z) - Style Transformer for Image Inversion and Editing [35.45674653596084]
Existing GAN inversion methods fail to provide latent codes for reliable reconstruction and flexible editing simultaneously.
This paper presents a transformer-based image inversion and editing model for pretrained StyleGAN.
The proposed model employs a CNN encoder to provide multi-scale image features as keys and values.
arXiv Detail & Related papers (2022-03-15T14:16:57Z) - EditGAN: High-Precision Semantic Image Editing [120.49401527771067]
EditGAN is a novel method for high quality, high precision semantic image editing.
We show that EditGAN can manipulate images with an unprecedented level of detail and freedom.
We can also easily combine multiple edits and perform plausible edits beyond EditGAN training data.
arXiv Detail & Related papers (2021-11-04T22:36:33Z) - Pivotal Tuning for Latent-based Editing of Real Images [40.22151052441958]
A surge of advanced facial editing techniques have been proposed that leverage the generative power of a pre-trained StyleGAN.
To successfully edit an image this way, one must first project (or invert) the image into the pre-trained generator's domain.
This means it is still challenging to apply ID-preserving facial latent-space editing to faces which are out of the generator's domain.
arXiv Detail & Related papers (2021-06-10T13:47:59Z) - Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space
Navigation [136.53288628437355]
Controllable semantic image editing enables a user to change entire image attributes with few clicks.
Current approaches often suffer from attribute edits that are entangled, global image identity changes, and diminished photo-realism.
We propose quantitative evaluation strategies for measuring controllable editing performance, unlike prior work which primarily focuses on qualitative evaluation.
arXiv Detail & Related papers (2021-02-01T21:38:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.