Learning by Planning: Language-Guided Global Image Editing
- URL: http://arxiv.org/abs/2106.13156v1
- Date: Thu, 24 Jun 2021 16:30:03 GMT
- Title: Learning by Planning: Language-Guided Global Image Editing
- Authors: Jing Shi, Ning Xu, Yihang Xu, Trung Bui, Franck Dernoncourt, Chenliang
Xu
- Abstract summary: We develop a text-to-operation model to map the vague editing language request into a series of editing operations.
The only supervision in the task is the target image, which is insufficient for a stable training of sequential decisions.
We propose a novel operation planning algorithm to generate possible editing sequences from the target image as pseudo ground truth.
- Score: 53.72807421111136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, language-guided global image editing draws increasing attention
with growing application potentials. However, previous GAN-based methods are
not only confined to domain-specific, low-resolution data but also lacking in
interpretability. To overcome the collective difficulties, we develop a
text-to-operation model to map the vague editing language request into a series
of editing operations, e.g., change contrast, brightness, and saturation. Each
operation is interpretable and differentiable. Furthermore, the only
supervision in the task is the target image, which is insufficient for a stable
training of sequential decisions. Hence, we propose a novel operation planning
algorithm to generate possible editing sequences from the target image as
pseudo ground truth. Comparison experiments on the newly collected MA5k-Req
dataset and GIER dataset show the advantages of our methods. Code is available
at https://jshi31.github.io/T2ONet.
Related papers
- A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models [117.77807994397784]
Image editing aims to edit the given synthetic or real image to meet the specific requirements from users.
Recent significant advancement in this field is based on the development of text-to-image (T2I) diffusion models.
T2I-based image editing methods significantly enhance editing performance and offer a user-friendly interface for modifying content guided by multimodal inputs.
arXiv Detail & Related papers (2024-06-20T17:58:52Z) - EditWorld: Simulating World Dynamics for Instruction-Following Image Editing [68.6224340373457]
Diffusion models have significantly improved the performance of image editing.
We introduce world-instructed image editing, which defines and categorizes the instructions grounded by various world scenarios.
Our method significantly outperforms existing editing methods in this new task.
arXiv Detail & Related papers (2024-05-23T16:54:17Z) - InstructGIE: Towards Generalizable Image Editing [34.83188723673297]
We introduce a novel image editing framework with enhanced generalization robustness.
This framework incorporates a module specifically optimized for image editing tasks, leveraging the VMamba Block.
We also unveil a selective area-matching technique specifically engineered to address and rectify corrupted details in generated images.
arXiv Detail & Related papers (2024-03-08T03:43:04Z) - Variational Bayesian Framework for Advanced Image Generation with
Domain-Related Variables [29.827191184889898]
We present a unified Bayesian framework for advanced conditional generative problems.
We propose a variational Bayesian image translation network (VBITN) that enables multiple image translation and editing tasks.
arXiv Detail & Related papers (2023-05-23T09:47:23Z) - iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing.
It generates images conditioned on a source image and a textual edit prompt.
It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z) - End-to-End Visual Editing with a Generatively Pre-Trained Artist [78.5922562526874]
We consider the targeted image editing problem: blending a region in a source image with a driver image that specifies the desired change.
We propose a self-supervised approach that simulates edits by augmenting off-the-shelf images in a target domain.
We show that different blending effects can be learned by an intuitive control of the augmentation process, with no other changes required to the model architecture.
arXiv Detail & Related papers (2022-05-03T17:59:30Z) - RTIC: Residual Learning for Text and Image Composition using Graph
Convolutional Network [19.017377597937617]
We study the compositional learning of images and texts for image retrieval.
We introduce a novel method that combines the graph convolutional network (GCN) with existing composition methods.
arXiv Detail & Related papers (2021-04-07T09:41:52Z) - A Benchmark and Baseline for Language-Driven Image Editing [81.74863590492663]
We first present a new language-driven image editing dataset that supports both local and global editing.
Our new method treats each editing operation as a sub-module and can automatically predict operation parameters.
We believe our work, including both the benchmark and the baseline, will advance the image editing area towards a more general and free-form level.
arXiv Detail & Related papers (2020-10-05T20:51:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.