ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space
Manipulation
- URL: http://arxiv.org/abs/2305.14742v2
- Date: Mon, 5 Jun 2023 10:34:05 GMT
- Title: ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space
Manipulation
- Authors: Dongxu Yue, Qin Guo, Munan Ning, Jiaxi Cui, Yuesheng Zhu, Li Yuan
- Abstract summary: We propose a novel approach that conduct text-driven image editing in the semantic latent space of diffusion model.
By aligning the temporal feature of the diffusion model with the semantic condition at generative process, we introduce a stable manipulation strategy.
We develop an interactive system named ChatFace, which combines the zero-shot reasoning ability of large language models to perform efficient manipulations.
- Score: 22.724306705927095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Editing real facial images is a crucial task in computer vision with
significant demand in various real-world applications. While GAN-based methods
have showed potential in manipulating images especially when combined with
CLIP, these methods are limited in their ability to reconstruct real images due
to challenging GAN inversion capability. Despite the successful image
reconstruction achieved by diffusion-based methods, there are still challenges
in effectively manipulating fine-gained facial attributes with textual
instructions.To address these issues and facilitate convenient manipulation of
real facial images, we propose a novel approach that conduct text-driven image
editing in the semantic latent space of diffusion model. By aligning the
temporal feature of the diffusion model with the semantic condition at
generative process, we introduce a stable manipulation strategy, which perform
precise zero-shot manipulation effectively. Furthermore, we develop an
interactive system named ChatFace, which combines the zero-shot reasoning
ability of large language models to perform efficient manipulations in
diffusion semantic latent space. This system enables users to perform complex
multi-attribute manipulations through dialogue, opening up new possibilities
for interactive image editing. Extensive experiments confirmed that our
approach outperforms previous methods and enables precise editing of real
facial images, making it a promising candidate for real-world applications.
Project page: https://dongxuyue.github.io/chatface/
Related papers
- OSDFace: One-Step Diffusion Model for Face Restoration [72.5045389847792]
Diffusion models have demonstrated impressive performance in face restoration.
We propose OSDFace, a novel one-step diffusion model for face restoration.
Results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-11-26T07:07:48Z) - Revealing Directions for Text-guided 3D Face Editing [52.85632020601518]
3D face editing is a significant task in multimedia, aimed at the manipulation of 3D face models across various control signals.
We present Face Clan, a text-general approach for generating and manipulating 3D faces based on arbitrary attribute descriptions.
Our method offers a precisely controllable manipulation method, allowing users to intuitively customize regions of interest with the text description.
arXiv Detail & Related papers (2024-10-07T12:04:39Z) - Controllable Talking Face Generation by Implicit Facial Keypoints Editing [6.036277153327655]
We present ControlTalk, a talking face generation method to control face expression deformation based on driven audio.
Our experiments show that our method is superior to state-of-the-art performance on widely used benchmarks, including HDTF and MEAD.
arXiv Detail & Related papers (2024-06-05T02:54:46Z) - ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based
Image Manipulation [49.07254928141495]
We propose a novel manipulation methodology, dubbed ImageBrush, that learns visual instructions for more accurate image editing.
Our key idea is to employ a pair of transformation images as visual instructions, which precisely captures human intention.
Our model exhibits robust generalization capabilities on various downstream tasks such as pose transfer, image translation and video inpainting.
arXiv Detail & Related papers (2023-08-02T01:57:11Z) - FaceCLIPNeRF: Text-driven 3D Face Manipulation using Deformable Neural
Radiance Fields [39.57313951313061]
Existing manipulation methods require extensive human labor.
Our approach is designed to require a single text to manipulate a face reconstructed with NeRF.
Our approach is the first to address the text-driven manipulation of a face reconstructed with NeRF.
arXiv Detail & Related papers (2023-07-21T08:22:14Z) - DreamIdentity: Improved Editability for Efficient Face-identity
Preserved Image Generation [69.16517915592063]
We propose a novel face-identity encoder to learn an accurate representation of human faces.
We also propose self-augmented editability learning to enhance the editability of models.
Our methods can generate identity-preserved images under different scenes at a much faster speed.
arXiv Detail & Related papers (2023-07-01T11:01:17Z) - Face Forgery Detection Based on Facial Region Displacement Trajectory
Series [10.338298543908339]
We develop a method for detecting manipulated videos based on the trajectory of the facial region displacement.
This information was used to construct a network for exposing multidimensional artifacts in the trajectory sequences of manipulated videos.
arXiv Detail & Related papers (2022-12-07T14:47:54Z) - LDEdit: Towards Generalized Text Guided Image Manipulation via Latent
Diffusion Models [12.06277444740134]
generic image manipulation using a single model with flexible text inputs is highly desirable.
Recent work addresses this task by guiding generative models trained on the generic image using pretrained vision-language encoders.
We propose an optimization-free method for the task of generic image manipulation from text prompts.
arXiv Detail & Related papers (2022-10-05T13:26:15Z) - StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery [71.1862388442953]
We develop a text-based interface for StyleGAN image manipulation.
We first introduce an optimization scheme that utilizes a CLIP-based loss to modify an input latent vector in response to a user-provided text prompt.
Next, we describe a latent mapper that infers a text-guided latent manipulation step for a given input image, allowing faster and more stable text-based manipulation.
arXiv Detail & Related papers (2021-03-31T17:51:25Z) - S2FGAN: Semantically Aware Interactive Sketch-to-Face Translation [11.724779328025589]
This paper proposes a sketch-to-image generation framework called S2FGAN.
We employ two latent spaces to control the face appearance and adjust the desired attributes of the generated face.
Our method successfully outperforms state-of-the-art methods on attribute manipulation by exploiting greater control of attribute intensity.
arXiv Detail & Related papers (2020-11-30T13:42:39Z) - PIE: Portrait Image Embedding for Semantic Control [82.69061225574774]
We present the first approach for embedding real portrait images in the latent space of StyleGAN.
We use StyleRig, a pretrained neural network that maps the control space of a 3D morphable face model to the latent space of the GAN.
An identity energy preservation term allows spatially coherent edits while maintaining facial integrity.
arXiv Detail & Related papers (2020-09-20T17:53:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.