DialogPaint: A Dialog-based Image Editing Model
- URL: http://arxiv.org/abs/2303.10073v2
- Date: Wed, 18 Oct 2023 02:08:01 GMT
- Title: DialogPaint: A Dialog-based Image Editing Model
- Authors: Jingxuan Wei, Shiyu Wu, Xin Jiang, Yequan Wang
- Abstract summary: DialogPaint is a novel framework that bridges conversational interactions with image editing.
By integrating a dialogue model with the Stable Diffusion image transformation technique, DialogPaint offers a more intuitive and interactive approach to image modifications.
- Score: 21.51417302677082
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce DialogPaint, a novel framework that bridges conversational
interactions with image editing, enabling users to modify images through
natural dialogue. By integrating a dialogue model with the Stable Diffusion
image transformation technique, DialogPaint offers a more intuitive and
interactive approach to image modifications. Our method stands out by
effectively interpreting and executing both explicit and ambiguous
instructions, handling tasks such as object replacement, style transfer, and
color modification. Notably, DialogPaint supports iterative, multi-round
editing, allowing users to refine image edits over successive interactions.
Comprehensive evaluations highlight the robustness and versatility of our
approach, marking a significant advancement in dialogue-driven image editing.
Related papers
- BrushEdit: All-In-One Image Inpainting and Editing [79.55816192146762]
BrushEdit is a novel inpainting-based instruction-guided image editing paradigm.
We devise a system enabling free-form instruction editing by integrating MLLMs and a dual-branch image inpainting model.
Our framework effectively combines MLLMs and inpainting models, achieving superior performance across seven metrics.
arXiv Detail & Related papers (2024-12-13T17:58:06Z) - Text-Driven Image Editing via Learnable Regions [74.45313434129005]
We introduce a method for region-based image editing driven by textual prompts, without the need for user-provided masks or sketches.
We show that this simple approach enables flexible editing that is compatible with current image generation models.
Experiments demonstrate the competitive performance of our method in manipulating images with high fidelity and realism that correspond to the provided language descriptions.
arXiv Detail & Related papers (2023-11-28T02:27:31Z) - Teaching Text-to-Image Models to Communicate in Dialog [44.76942024105259]
In this paper, we focus on the innovative dialog-to-image generation task.
To tackle this problem, we design a tailored fine-tuning approach on the top of state-of-the-art text-to-image generation models.
Our approach brings consistent and remarkable improvement with 3 state-of-the-art pre-trained text-to-image generation backbones.
arXiv Detail & Related papers (2023-09-27T09:33:16Z) - IMAD: IMage-Augmented multi-modal Dialogue [0.043847653914745384]
This paper presents a novel perspective on multi-modal dialogue systems, which interprets the image in the context of the dialogue.
We propose a two-stage approach to automatically construct a multi-modal dialogue dataset.
In the first stage, we utilize text-to-image similarity and sentence similarity to identify which utterances could be replaced with an image.
In the second stage, we replace those utterances by selecting a subset of relevant images and filtering them with a visual question answering model.
arXiv Detail & Related papers (2023-05-17T18:38:10Z) - Dialog act guided contextual adapter for personalized speech recognition [9.672512327395435]
Personalization in multi-turn dialogs has been a long standing challenge for end-to-end automatic speech recognition (E2E ASR) models.
Recent work on contextual adapters has tackled rare word recognition using user catalogs.
We propose a dialog act guided contextual adapter network.
arXiv Detail & Related papers (2023-03-31T05:13:44Z) - CHATEDIT: Towards Multi-turn Interactive Facial Image Editing via
Dialogue [17.503012018823902]
This paper introduces the ChatEdit benchmark dataset for evaluating image editing and conversation abilities.
ChatEdit is constructed from the CelebA-HQ dataset, incorporating annotated multi-turn dialogues corresponding to user edit requests on the images.
We present a novel baseline framework that integrates a dialogue module for both tracking user requests and generating responses.
arXiv Detail & Related papers (2023-03-20T13:45:58Z) - Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image
Inpainting [53.708523312636096]
We present Imagen Editor, a cascaded diffusion model built, by fine-tuning on text-guided image inpainting.
edits are faithful to the text prompts, which is accomplished by using object detectors to propose inpainting masks during training.
To improve qualitative and quantitative evaluation, we introduce EditBench, a systematic benchmark for text-guided image inpainting.
arXiv Detail & Related papers (2022-12-13T21:25:11Z) - VD-BERT: A Unified Vision and Dialog Transformer with BERT [161.0016161052714]
We propose VD-BERT, a simple yet effective framework of unified vision-dialog Transformer.
We adapt BERT for the effective fusion of vision and dialog contents via visually grounded training.
Our model yields new state of the art, achieving the top position in both single-model and ensemble settings.
arXiv Detail & Related papers (2020-04-28T04:08:46Z) - Conversation Learner -- A Machine Teaching Tool for Building Dialog
Managers for Task-Oriented Dialog Systems [57.082447660944965]
Conversation Learner is a machine teaching tool for building dialog managers.
It enables dialog authors to create a dialog flow using familiar tools, converting the dialog flow into a parametric model.
It allows dialog authors to improve the dialog manager over time by leveraging user-system dialog logs as training data.
arXiv Detail & Related papers (2020-04-09T00:10:54Z) - Open Domain Dialogue Generation with Latent Images [43.78366219197779]
We propose learning a response generation model with both image-grounded dialogues and textual dialogues.
In the first scenario, image-grounded dialogues can be effectively augmented by textual dialogues with latent images.
In the second scenario, latent images can enrich the content of responses and at the same time keep them relevant to contexts.
arXiv Detail & Related papers (2020-04-04T17:32:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.