Related papers: CHATEDIT: Towards Multi-turn Interactive Facial Image Editing via Dialogue

CHATEDIT: Towards Multi-turn Interactive Facial Image Editing via Dialogue

URL: http://arxiv.org/abs/2303.11108v3
Date: Mon, 16 Oct 2023 04:04:12 GMT
Title: CHATEDIT: Towards Multi-turn Interactive Facial Image Editing via Dialogue
Authors: Xing Cui, Zekun Li, Peipei Li, Yibo Hu, Hailin Shi, Zhaofeng He
Abstract summary: This paper introduces the ChatEdit benchmark dataset for evaluating image editing and conversation abilities. ChatEdit is constructed from the CelebA-HQ dataset, incorporating annotated multi-turn dialogues corresponding to user edit requests on the images. We present a novel baseline framework that integrates a dialogue module for both tracking user requests and generating responses.
Score: 17.503012018823902
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper explores interactive facial image editing via dialogue and introduces the ChatEdit benchmark dataset for evaluating image editing and conversation abilities in this context. ChatEdit is constructed from the CelebA-HQ dataset, incorporating annotated multi-turn dialogues corresponding to user edit requests on the images. The dataset is challenging, as it requires the system to dynamically track user requests, edit images, and generate appropriate responses. Accordingly, we propose three benchmark tasks: (i) user edit request tracking, (ii) image editing, and (iii) response generation. We present a novel baseline framework that integrates a dialogue module for both tracking user requests and generating responses and an image editing module for image editing. Unlike previous approaches, our framework directly tracks user edit requests from the entire dialogue history up to the current turn and modifies the original image rather than adjusting the previous turn's output, thereby reducing error accumulation and preventing attribute forgetfulness. Extensive experiments on the ChatEdit dataset underline our framework's superior performance against prior models, while also highlighting potential room for further research. We will release the code and data publicly to facilitate advancements in complex interactive facial image editing.

Related papers

InteractEdit: Zero-Shot Editing of Human-Object Interactions in Images [42.8116807595149]
We present InteractEdit, a novel framework for zero-shot Human-Object Interaction (HOI) editing. It transforms an existing interaction in an image into a new, desired interaction while preserving the identities of the subject and object. Our experiments show that InteractEdit significantly outperforms existing methods.
arXiv Detail & Related papers (2025-03-12T07:40:45Z)
IE-Bench: Advancing the Measurement of Text-Driven Image Editing for Human Perception Alignment [6.627422081288281]
We introduce the Text-driven Image Editing Benchmark suite (IE-Bench) to enhance the assessment of text-driven edited images. IE-Bench includes a database containing diverse source images, various editing prompts and the corresponding results. We also introduce IE-QA, a multi-modality source-aware quality assessment method for text-driven image editing.
arXiv Detail & Related papers (2025-01-17T02:47:25Z)
EditScribe: Non-Visual Image Editing with Natural Language Verification Loops [12.16675723509151]
EditScribe is a prototype system that makes image editing accessible using natural language verification loops powered by large multimodal models. The user first comprehends the image content through initial general and object descriptions, then specifies edit actions using open-ended natural language prompts. In a study with ten blind or low-vision users, we found that EditScribe supported participants to perform and verify image edit actions non-visually.
arXiv Detail & Related papers (2024-08-13T04:40:56Z)
BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation [21.052101309555464]
Multimodal Dialogue Response Generation (MDRG) is a recently proposed task where the model needs to generate responses in texts, images, or a blend of both. Previous work relies on the text modality as an intermediary step for both the image input and output of the model rather than adopting an end-to-end approach. We propose BI-MDRG that bridges the response generation path such that the image history information is utilized for enhanced relevance of text responses to the image content.
arXiv Detail & Related papers (2024-08-12T05:22:42Z)
An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control [21.624984690721842]
D-Edit is a framework to disentangle the comprehensive image-prompt interaction into several item-prompt interactions. It is based on pretrained diffusion models with cross-attention layers disentangled and adopts a two-step optimization to build item-prompt associations. We demonstrate state-of-the-art results in four types of editing operations including image-based, text-based, mask-based editing, and item removal.
arXiv Detail & Related papers (2024-03-07T20:06:29Z)
Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training [61.984277261016146]
We propose a CustomNeRF model that unifies a text description or a reference image as the editing prompt. To tackle the first challenge, we propose a Local-Global Iterative Editing (LGIE) training scheme that alternates between foreground region editing and full-image editing. For the second challenge, we also design a class-guided regularization that exploits class priors within the generation model to alleviate the inconsistency problem.
arXiv Detail & Related papers (2023-12-04T06:25:06Z)
iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing. It generates images conditioned on a source image and a textual edit prompt. It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z)
DialogPaint: A Dialog-based Image Editing Model [21.51417302677082]
DialogPaint is a novel framework that bridges conversational interactions with image editing. By integrating a dialogue model with the Stable Diffusion image transformation technique, DialogPaint offers a more intuitive and interactive approach to image modifications.
arXiv Detail & Related papers (2023-03-17T15:54:30Z)
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting [53.708523312636096]
We present Imagen Editor, a cascaded diffusion model built, by fine-tuning on text-guided image inpainting. edits are faithful to the text prompts, which is accomplished by using object detectors to propose inpainting masks during training. To improve qualitative and quantitative evaluation, we introduce EditBench, a systematic benchmark for text-guided image inpainting.
arXiv Detail & Related papers (2022-12-13T21:25:11Z)
HairCLIP: Design Your Hair by Text and Reference Image [100.85116679883724]
This paper proposes a new hair editing interaction mode, which enables manipulating hair attributes individually or jointly. We encode the image and text conditions in a shared embedding space and propose a unified hair editing framework. With the carefully designed network structures and loss functions, our framework can perform high-quality hair editing.
arXiv Detail & Related papers (2021-12-09T18:59:58Z)
Talk-to-Edit: Fine-Grained Facial Editing via Dialog [79.8726256912376]
Talk-to-Edit is an interactive facial editing framework that performs fine-grained attribute manipulation through dialog between the user and the system. Our key insight is to model a continual "semantic field" in the GAN latent space. Our system generates language feedback by considering both the user request and the current state of the semantic field.
arXiv Detail & Related papers (2021-09-09T17:17:59Z)
Text Editing by Command [82.50904226312451]
A prevailing paradigm in neural text generation is one-shot generation, where text is produced in a single step. We address this limitation with an interactive text generation setting in which the user interacts with the system by issuing commands to edit existing text. We show that our Interactive Editor, a transformer-based model trained on this dataset, outperforms baselines and obtains positive results in both automatic and human evaluations.
arXiv Detail & Related papers (2020-10-24T08:00:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.