Adjusting Image Attributes of Localized Regions with Low-level Dialogue
- URL: http://arxiv.org/abs/2002.04678v1
- Date: Tue, 11 Feb 2020 20:59:34 GMT
- Title: Adjusting Image Attributes of Localized Regions with Low-level Dialogue
- Authors: Tzu-Hsiang Lin, Alexander Rudnicky, Trung Bui, Doo Soon Kim, Jean Oh
- Abstract summary: We develop a task-oriented dialogue system to investigate low-level instructions for NLIE.
Our system grounds language on the level of edit operations, and suggests options for a user to choose from.
An analysis shows that users generally adapt to utilizing the proposed low-level language interface.
- Score: 83.06971746641686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural Language Image Editing (NLIE) aims to use natural language
instructions to edit images. Since novices are inexperienced with image editing
techniques, their instructions are often ambiguous and contain high-level
abstractions that tend to correspond to complex editing steps to accomplish.
Motivated by this inexperience aspect, we aim to smooth the learning curve by
teaching the novices to edit images using low-level commanding terminologies.
Towards this end, we develop a task-oriented dialogue system to investigate
low-level instructions for NLIE. Our system grounds language on the level of
edit operations, and suggests options for a user to choose from. Though
compelled to express in low-level terms, a user evaluation shows that 25% of
users found our system easy-to-use, resonating with our motivation. An analysis
shows that users generally adapt to utilizing the proposed low-level language
interface. In this study, we identify that object segmentation as the key
factor to the user satisfaction. Our work demonstrates the advantages of the
low-level, direct language-action mapping approach that can be applied to other
problem domains beyond image editing such as audio editing or industrial
design.
Related papers
- Zero-shot Image Editing with Reference Imitation [50.75310094611476]
We present a new form of editing, termed imitative editing, to help users exercise their creativity more conveniently.
We propose a generative training framework, dubbed MimicBrush, which randomly selects two frames from a video clip, masks some regions of one frame, and learns to recover the masked regions using the information from the other frame.
We experimentally show the effectiveness of our method under various test cases as well as its superiority over existing alternatives.
arXiv Detail & Related papers (2024-06-11T17:59:51Z) - Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations [109.65267337037842]
We introduce the task of Image Editing Recommendation (IER)
IER aims to automatically generate diverse creative editing instructions from an input image and a simple prompt representing the users' under-specified editing purpose.
We introduce Creativity-Vision Language Assistant(Creativity-VLA), a multimodal framework designed specifically for edit-instruction generation.
arXiv Detail & Related papers (2024-05-31T18:22:29Z) - InstructGIE: Towards Generalizable Image Editing [34.83188723673297]
We introduce a novel image editing framework with enhanced generalization robustness.
This framework incorporates a module specifically optimized for image editing tasks, leveraging the VMamba Block.
We also unveil a selective area-matching technique specifically engineered to address and rectify corrupted details in generated images.
arXiv Detail & Related papers (2024-03-08T03:43:04Z) - Text-Driven Image Editing via Learnable Regions [74.45313434129005]
We introduce a method for region-based image editing driven by textual prompts, without the need for user-provided masks or sketches.
We show that this simple approach enables flexible editing that is compatible with current image generation models.
Experiments demonstrate the competitive performance of our method in manipulating images with high fidelity and realism that correspond to the provided language descriptions.
arXiv Detail & Related papers (2023-11-28T02:27:31Z) - CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing [22.40686064568406]
We present CLIPInverter, a new text-driven image editing approach that is able to efficiently and reliably perform multi-attribute changes.
Our method outperforms competing approaches in terms of manipulation accuracy and photo-realism on various domains including human faces, cats, and birds.
arXiv Detail & Related papers (2023-07-17T11:29:48Z) - User-friendly Image Editing with Minimal Text Input: Leveraging
Captioning and Injection Techniques [32.82206298102458]
Text-driven image editing has shown remarkable success in diffusion models.
The existing methods assume that the user's description sufficiently grounds the contexts in the source image.
We propose simple yet effective methods by combining prompt generation frameworks.
arXiv Detail & Related papers (2023-06-05T09:09:10Z) - Language Guided Local Infiltration for Interactive Image Retrieval [12.324893780690918]
Interactive Image Retrieval (IIR) aims to retrieve images that are generally similar to the reference image but under requested text modification.
We propose a Language Guided Local Infiltration (LGLI) system, which fully utilizes the text information and penetrates text features into image features.
Our method outperforms most state-of-the-art IIR approaches.
arXiv Detail & Related papers (2023-04-16T10:33:08Z) - Learning by Planning: Language-Guided Global Image Editing [53.72807421111136]
We develop a text-to-operation model to map the vague editing language request into a series of editing operations.
The only supervision in the task is the target image, which is insufficient for a stable training of sequential decisions.
We propose a novel operation planning algorithm to generate possible editing sequences from the target image as pseudo ground truth.
arXiv Detail & Related papers (2021-06-24T16:30:03Z) - A Benchmark and Baseline for Language-Driven Image Editing [81.74863590492663]
We first present a new language-driven image editing dataset that supports both local and global editing.
Our new method treats each editing operation as a sub-module and can automatically predict operation parameters.
We believe our work, including both the benchmark and the baseline, will advance the image editing area towards a more general and free-form level.
arXiv Detail & Related papers (2020-10-05T20:51:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.