Talk-to-Edit: Fine-Grained Facial Editing via Dialog
- URL: http://arxiv.org/abs/2109.04425v1
- Date: Thu, 9 Sep 2021 17:17:59 GMT
- Title: Talk-to-Edit: Fine-Grained Facial Editing via Dialog
- Authors: Yuming Jiang, Ziqi Huang, Xingang Pan, Chen Change Loy, Ziwei Liu
- Abstract summary: Talk-to-Edit is an interactive facial editing framework that performs fine-grained attribute manipulation through dialog between the user and the system.
Our key insight is to model a continual "semantic field" in the GAN latent space.
Our system generates language feedback by considering both the user request and the current state of the semantic field.
- Score: 79.8726256912376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Facial editing is an important task in vision and graphics with numerous
applications. However, existing works are incapable to deliver a continuous and
fine-grained editing mode (e.g., editing a slightly smiling face to a big
laughing one) with natural interactions with users. In this work, we propose
Talk-to-Edit, an interactive facial editing framework that performs
fine-grained attribute manipulation through dialog between the user and the
system. Our key insight is to model a continual "semantic field" in the GAN
latent space. 1) Unlike previous works that regard the editing as traversing
straight lines in the latent space, here the fine-grained editing is formulated
as finding a curving trajectory that respects fine-grained attribute landscape
on the semantic field. 2) The curvature at each step is location-specific and
determined by the input image as well as the users' language requests. 3) To
engage the users in a meaningful dialog, our system generates language feedback
by considering both the user request and the current state of the semantic
field.
We also contribute CelebA-Dialog, a visual-language facial editing dataset to
facilitate large-scale study. Specifically, each image has manually annotated
fine-grained attribute annotations as well as template-based textual
descriptions in natural language. Extensive quantitative and qualitative
experiments demonstrate the superiority of our framework in terms of 1) the
smoothness of fine-grained editing, 2) the identity/attribute preservation, and
3) the visual photorealism and dialog fluency. Notably, user study validates
that our overall system is consistently favored by around 80% of the
participants. Our project page is https://www.mmlab-ntu.com/project/talkedit/.
Related papers
- SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing [42.23117201457898]
We introduce a new framework that integrates large language model (LLM) with Text2 generative model for graph-based image editing.
Our framework significantly outperforms existing image editing methods in terms of editing precision and scene aesthetics.
arXiv Detail & Related papers (2024-10-15T17:40:48Z) - A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models [117.77807994397784]
Image editing aims to edit the given synthetic or real image to meet the specific requirements from users.
Recent significant advancement in this field is based on the development of text-to-image (T2I) diffusion models.
T2I-based image editing methods significantly enhance editing performance and offer a user-friendly interface for modifying content guided by multimodal inputs.
arXiv Detail & Related papers (2024-06-20T17:58:52Z) - Zero-shot Image Editing with Reference Imitation [50.75310094611476]
We present a new form of editing, termed imitative editing, to help users exercise their creativity more conveniently.
We propose a generative training framework, dubbed MimicBrush, which randomly selects two frames from a video clip, masks some regions of one frame, and learns to recover the masked regions using the information from the other frame.
We experimentally show the effectiveness of our method under various test cases as well as its superiority over existing alternatives.
arXiv Detail & Related papers (2024-06-11T17:59:51Z) - Text-Driven Image Editing via Learnable Regions [74.45313434129005]
We introduce a method for region-based image editing driven by textual prompts, without the need for user-provided masks or sketches.
We show that this simple approach enables flexible editing that is compatible with current image generation models.
Experiments demonstrate the competitive performance of our method in manipulating images with high fidelity and realism that correspond to the provided language descriptions.
arXiv Detail & Related papers (2023-11-28T02:27:31Z) - FASTER: A Font-Agnostic Scene Text Editing and Rendering Framework [19.564048493848272]
Scene Text Editing (STE) is a challenging research problem, that primarily aims towards modifying existing texts in an image.
Existing style-transfer-based approaches have shown sub-par editing performance due to complex image backgrounds, diverse font attributes, and varying word lengths within the text.
We propose a novel font-agnostic scene text editing and rendering framework, named FASTER, for simultaneously generating text in arbitrary styles and locations.
arXiv Detail & Related papers (2023-08-05T15:54:06Z) - Zero-shot Image-to-Image Translation [57.46189236379433]
We propose pix2pix-zero, an image-to-image translation method that can preserve the original image without manual prompting.
We propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process.
Our method does not need additional training for these edits and can directly use the existing text-to-image diffusion model.
arXiv Detail & Related papers (2023-02-06T18:59:51Z) - HairCLIP: Design Your Hair by Text and Reference Image [100.85116679883724]
This paper proposes a new hair editing interaction mode, which enables manipulating hair attributes individually or jointly.
We encode the image and text conditions in a shared embedding space and propose a unified hair editing framework.
With the carefully designed network structures and loss functions, our framework can perform high-quality hair editing.
arXiv Detail & Related papers (2021-12-09T18:59:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.