CHATEDIT: Towards Multi-turn Interactive Facial Image Editing via
Dialogue
- URL: http://arxiv.org/abs/2303.11108v3
- Date: Mon, 16 Oct 2023 04:04:12 GMT
- Title: CHATEDIT: Towards Multi-turn Interactive Facial Image Editing via
Dialogue
- Authors: Xing Cui, Zekun Li, Peipei Li, Yibo Hu, Hailin Shi, Zhaofeng He
- Abstract summary: This paper introduces the ChatEdit benchmark dataset for evaluating image editing and conversation abilities.
ChatEdit is constructed from the CelebA-HQ dataset, incorporating annotated multi-turn dialogues corresponding to user edit requests on the images.
We present a novel baseline framework that integrates a dialogue module for both tracking user requests and generating responses.
- Score: 17.503012018823902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper explores interactive facial image editing via dialogue and
introduces the ChatEdit benchmark dataset for evaluating image editing and
conversation abilities in this context. ChatEdit is constructed from the
CelebA-HQ dataset, incorporating annotated multi-turn dialogues corresponding
to user edit requests on the images. The dataset is challenging, as it requires
the system to dynamically track user requests, edit images, and generate
appropriate responses. Accordingly, we propose three benchmark tasks: (i) user
edit request tracking, (ii) image editing, and (iii) response generation. We
present a novel baseline framework that integrates a dialogue module for both
tracking user requests and generating responses and an image editing module for
image editing. Unlike previous approaches, our framework directly tracks user
edit requests from the entire dialogue history up to the current turn and
modifies the original image rather than adjusting the previous turn's output,
thereby reducing error accumulation and preventing attribute forgetfulness.
Extensive experiments on the ChatEdit dataset underline our framework's
superior performance against prior models, while also highlighting potential
room for further research. We will release the code and data publicly to
facilitate advancements in complex interactive facial image editing.
Related papers
- An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control [21.624984690721842]
D-Edit is a framework to disentangle the comprehensive image-prompt interaction into several item-prompt interactions.
It is based on pretrained diffusion models with cross-attention layers disentangled and adopts a two-step optimization to build item-prompt associations.
We demonstrate state-of-the-art results in four types of editing operations including image-based, text-based, mask-based editing, and item removal.
arXiv Detail & Related papers (2024-03-07T20:06:29Z) - Customize your NeRF: Adaptive Source Driven 3D Scene Editing via
Local-Global Iterative Training [61.984277261016146]
We propose a CustomNeRF model that unifies a text description or a reference image as the editing prompt.
To tackle the first challenge, we propose a Local-Global Iterative Editing (LGIE) training scheme that alternates between foreground region editing and full-image editing.
For the second challenge, we also design a class-guided regularization that exploits class priors within the generation model to alleviate the inconsistency problem.
arXiv Detail & Related papers (2023-12-04T06:25:06Z) - Visual Instruction Inversion: Image Editing via Visual Prompting [34.96778567507126]
We present a method for image editing via visual prompting.
We leverage the rich, pretrained editing capabilities of text-to-image diffusion models by inverting visual prompts into editing instructions.
arXiv Detail & Related papers (2023-07-26T17:50:10Z) - IMAD: IMage-Augmented multi-modal Dialogue [0.043847653914745384]
This paper presents a novel perspective on multi-modal dialogue systems, which interprets the image in the context of the dialogue.
We propose a two-stage approach to automatically construct a multi-modal dialogue dataset.
In the first stage, we utilize text-to-image similarity and sentence similarity to identify which utterances could be replaced with an image.
In the second stage, we replace those utterances by selecting a subset of relevant images and filtering them with a visual question answering model.
arXiv Detail & Related papers (2023-05-17T18:38:10Z) - iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing.
It generates images conditioned on a source image and a textual edit prompt.
It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z) - DialogPaint: A Dialog-based Image Editing Model [21.51417302677082]
DialogPaint is a novel framework that bridges conversational interactions with image editing.
By integrating a dialogue model with the Stable Diffusion image transformation technique, DialogPaint offers a more intuitive and interactive approach to image modifications.
arXiv Detail & Related papers (2023-03-17T15:54:30Z) - Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image
Inpainting [53.708523312636096]
We present Imagen Editor, a cascaded diffusion model built, by fine-tuning on text-guided image inpainting.
edits are faithful to the text prompts, which is accomplished by using object detectors to propose inpainting masks during training.
To improve qualitative and quantitative evaluation, we introduce EditBench, a systematic benchmark for text-guided image inpainting.
arXiv Detail & Related papers (2022-12-13T21:25:11Z) - HairCLIP: Design Your Hair by Text and Reference Image [100.85116679883724]
This paper proposes a new hair editing interaction mode, which enables manipulating hair attributes individually or jointly.
We encode the image and text conditions in a shared embedding space and propose a unified hair editing framework.
With the carefully designed network structures and loss functions, our framework can perform high-quality hair editing.
arXiv Detail & Related papers (2021-12-09T18:59:58Z) - Talk-to-Edit: Fine-Grained Facial Editing via Dialog [79.8726256912376]
Talk-to-Edit is an interactive facial editing framework that performs fine-grained attribute manipulation through dialog between the user and the system.
Our key insight is to model a continual "semantic field" in the GAN latent space.
Our system generates language feedback by considering both the user request and the current state of the semantic field.
arXiv Detail & Related papers (2021-09-09T17:17:59Z) - Text Editing by Command [82.50904226312451]
A prevailing paradigm in neural text generation is one-shot generation, where text is produced in a single step.
We address this limitation with an interactive text generation setting in which the user interacts with the system by issuing commands to edit existing text.
We show that our Interactive Editor, a transformer-based model trained on this dataset, outperforms baselines and obtains positive results in both automatic and human evaluations.
arXiv Detail & Related papers (2020-10-24T08:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.