Zero-shot Text-driven Physically Interpretable Face Editing
- URL: http://arxiv.org/abs/2308.05976v1
- Date: Fri, 11 Aug 2023 07:20:24 GMT
- Title: Zero-shot Text-driven Physically Interpretable Face Editing
- Authors: Yapeng Meng, Songru Yang, Xu Hu, Rui Zhao, Lincheng Li, Zhenwei Shi,
Zhengxia Zou
- Abstract summary: This paper proposes a novel and physically interpretable method for face editing based on arbitrary text prompts.
Our method can generate physically interpretable face editing results with high identity consistency and image quality.
- Score: 29.32334174584623
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper proposes a novel and physically interpretable method for face
editing based on arbitrary text prompts. Different from previous
GAN-inversion-based face editing methods that manipulate the latent space of
GANs, or diffusion-based methods that model image manipulation as a reverse
diffusion process, we regard the face editing process as imposing vector flow
fields on face images, representing the offset of spatial coordinates and color
for each image pixel. Under the above-proposed paradigm, we represent the
vector flow field in two ways: 1) explicitly represent the flow vectors with
rasterized tensors, and 2) implicitly parameterize the flow vectors as
continuous, smooth, and resolution-agnostic neural fields, by leveraging the
recent advances of implicit neural representations. The flow vectors are
iteratively optimized under the guidance of the pre-trained Contrastive
Language-Image Pretraining~(CLIP) model by maximizing the correlation between
the edited image and the text prompt. We also propose a learning-based one-shot
face editing framework, which is fast and adaptable to any text prompt input.
Our method can also be flexibly extended to real-time video face editing.
Compared with state-of-the-art text-driven face editing methods, our method can
generate physically interpretable face editing results with high identity
consistency and image quality. Our code will be made publicly available.
Related papers
- Perceptual Similarity guidance and text guidance optimization for
Editing Real Images using Guided Diffusion Models [0.6345523830122168]
We apply a dual-guidance approach to maintain high fidelity to the original in areas that are not altered.
This method ensures the realistic rendering of both the edited elements and the preservation of the unedited parts of the original image.
arXiv Detail & Related papers (2023-12-09T02:55:35Z) - Text-Driven Image Editing via Learnable Regions [74.45313434129005]
We introduce a method for region-based image editing driven by textual prompts, without the need for user-provided masks or sketches.
We show that this simple approach enables flexible editing that is compatible with current image generation models.
Experiments demonstrate the competitive performance of our method in manipulating images with high fidelity and realism that correspond to the provided language descriptions.
arXiv Detail & Related papers (2023-11-28T02:27:31Z) - Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion
Models [6.34777393532937]
We propose an accurate and quick inversion technique, Prompt Tuning Inversion, for text-driven image editing.
Our proposed editing method consists of a reconstruction stage and an editing stage.
Experiments on ImageNet demonstrate the superior editing performance of our method compared to the state-of-the-art baselines.
arXiv Detail & Related papers (2023-05-08T03:34:33Z) - Zero-shot Image-to-Image Translation [57.46189236379433]
We propose pix2pix-zero, an image-to-image translation method that can preserve the original image without manual prompting.
We propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process.
Our method does not need additional training for these edits and can directly use the existing text-to-image diffusion model.
arXiv Detail & Related papers (2023-02-06T18:59:51Z) - Towards Arbitrary Text-driven Image Manipulation via Space Alignment [49.3370305074319]
We propose a new Text-driven image Manipulation framework via Space Alignment (TMSA)
TMSA aims to align the same semantic regions in CLIP and StyleGAN spaces.
The framework can support arbitrary image editing mode without additional cost.
arXiv Detail & Related papers (2023-01-25T16:20:01Z) - DiffEdit: Diffusion-based semantic image editing with mask guidance [64.555930158319]
DiffEdit is a method to take advantage of text-conditioned diffusion models for the task of semantic image editing.
Our main contribution is able to automatically generate a mask highlighting regions of the input image that need to be edited.
arXiv Detail & Related papers (2022-10-20T17:16:37Z) - FlexIT: Towards Flexible Semantic Image Translation [59.09398209706869]
We propose FlexIT, a novel method which can take any input image and a user-defined text instruction for editing.
First, FlexIT combines the input image and text into a single target point in the CLIP multimodal embedding space.
We iteratively transform the input image toward the target point, ensuring coherence and quality with a variety of novel regularization terms.
arXiv Detail & Related papers (2022-03-09T13:34:38Z) - StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery [71.1862388442953]
We develop a text-based interface for StyleGAN image manipulation.
We first introduce an optimization scheme that utilizes a CLIP-based loss to modify an input latent vector in response to a user-provided text prompt.
Next, we describe a latent mapper that infers a text-guided latent manipulation step for a given input image, allowing faster and more stable text-based manipulation.
arXiv Detail & Related papers (2021-03-31T17:51:25Z) - S2FGAN: Semantically Aware Interactive Sketch-to-Face Translation [11.724779328025589]
This paper proposes a sketch-to-image generation framework called S2FGAN.
We employ two latent spaces to control the face appearance and adjust the desired attributes of the generated face.
Our method successfully outperforms state-of-the-art methods on attribute manipulation by exploiting greater control of attribute intensity.
arXiv Detail & Related papers (2020-11-30T13:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.