A Benchmark and Baseline for Language-Driven Image Editing
- URL: http://arxiv.org/abs/2010.02330v1
- Date: Mon, 5 Oct 2020 20:51:16 GMT
- Title: A Benchmark and Baseline for Language-Driven Image Editing
- Authors: Jing Shi, Ning Xu, Trung Bui, Franck Dernoncourt, Zheng Wen, Chenliang
Xu
- Abstract summary: We first present a new language-driven image editing dataset that supports both local and global editing.
Our new method treats each editing operation as a sub-module and can automatically predict operation parameters.
We believe our work, including both the benchmark and the baseline, will advance the image editing area towards a more general and free-form level.
- Score: 81.74863590492663
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language-driven image editing can significantly save the laborious image
editing work and be friendly to the photography novice. However, most similar
work can only deal with a specific image domain or can only do global
retouching. To solve this new task, we first present a new language-driven
image editing dataset that supports both local and global editing with editing
operation and mask annotations. Besides, we also propose a baseline method that
fully utilizes the annotation to solve this problem. Our new method treats each
editing operation as a sub-module and can automatically predict operation
parameters. Not only performing well on challenging user data, but such an
approach is also highly interpretable. We believe our work, including both the
benchmark and the baseline, will advance the image editing area towards a more
general and free-form level.
Related papers
- EditWorld: Simulating World Dynamics for Instruction-Following Image Editing [68.6224340373457]
Diffusion models have significantly improved the performance of image editing.
We introduce world-instructed image editing, which defines and categorizes the instructions grounded by various world scenarios.
Our method significantly outperforms existing editing methods in this new task.
arXiv Detail & Related papers (2024-05-23T16:54:17Z) - InstructBrush: Learning Attention-based Instruction Optimization for Image Editing [54.07526261513434]
InstructBrush is an inversion method for instruction-based image editing methods.
It extracts editing effects from image pairs as editing instructions, which are further applied for image editing.
Our approach achieves superior performance in editing and is more semantically consistent with the target editing effects.
arXiv Detail & Related papers (2024-03-27T15:03:38Z) - Edit One for All: Interactive Batch Image Editing [44.50631647670942]
This paper presents a novel method for interactive batch image editing using StyleGAN as the medium.
Given an edit specified by users in an example image (e.g., make the face frontal), our method can automatically transfer that edit to other test images.
Experiments demonstrate that edits performed using our method have similar visual quality to existing single-image-editing methods.
arXiv Detail & Related papers (2024-01-18T18:58:44Z) - Optimisation-Based Multi-Modal Semantic Image Editing [58.496064583110694]
We propose an inference-time editing optimisation to accommodate multiple editing instruction types.
By allowing to adjust the influence of each loss function, we build a flexible editing solution that can be adjusted to user preferences.
We evaluate our method using text, pose and scribble edit conditions, and highlight our ability to achieve complex edits.
arXiv Detail & Related papers (2023-11-28T15:31:11Z) - Visual Instruction Inversion: Image Editing via Visual Prompting [34.96778567507126]
We present a method for image editing via visual prompting.
We leverage the rich, pretrained editing capabilities of text-to-image diffusion models by inverting visual prompts into editing instructions.
arXiv Detail & Related papers (2023-07-26T17:50:10Z) - EditGAN: High-Precision Semantic Image Editing [120.49401527771067]
EditGAN is a novel method for high quality, high precision semantic image editing.
We show that EditGAN can manipulate images with an unprecedented level of detail and freedom.
We can also easily combine multiple edits and perform plausible edits beyond EditGAN training data.
arXiv Detail & Related papers (2021-11-04T22:36:33Z) - Learning by Planning: Language-Guided Global Image Editing [53.72807421111136]
We develop a text-to-operation model to map the vague editing language request into a series of editing operations.
The only supervision in the task is the target image, which is insufficient for a stable training of sequential decisions.
We propose a novel operation planning algorithm to generate possible editing sequences from the target image as pseudo ground truth.
arXiv Detail & Related papers (2021-06-24T16:30:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.