ReasonPix2Pix: Instruction Reasoning Dataset for Advanced Image Editing
- URL: http://arxiv.org/abs/2405.11190v2
- Date: Fri, 31 May 2024 07:24:55 GMT
- Title: ReasonPix2Pix: Instruction Reasoning Dataset for Advanced Image Editing
- Authors: Ying Jin, Pengyang Ling, Xiaoyi Dong, Pan Zhang, Jiaqi Wang, Dahua Lin,
- Abstract summary: We introduce ReasonPix2Pix, a comprehensive reasoning-attentive instruction editing dataset.
The dataset is characterized by 1) reasoning instruction, 2) more realistic images from fine-grained categories, and 3) increased variances between input and edited images.
When fine-tuned with our dataset under supervised conditions, the model demonstrates superior performance in instructional editing tasks, independent of whether the tasks require reasoning or not.
- Score: 77.12834553200632
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Instruction-based image editing focuses on equipping a generative model with the capacity to adhere to human-written instructions for editing images. Current approaches typically comprehend explicit and specific instructions. However, they often exhibit a deficiency in executing active reasoning capacities required to comprehend instructions that are implicit or insufficiently defined. To enhance active reasoning capabilities and impart intelligence to the editing model, we introduce ReasonPix2Pix, a comprehensive reasoning-attentive instruction editing dataset. The dataset is characterized by 1) reasoning instruction, 2) more realistic images from fine-grained categories, and 3) increased variances between input and edited images. When fine-tuned with our dataset under supervised conditions, the model demonstrates superior performance in instructional editing tasks, independent of whether the tasks require reasoning or not. The code will be available at https://github.com/Jin-Ying/ReasonPix2Pix.
Related papers
- Multi-Reward as Condition for Instruction-based Image Editing [32.77114231615961]
We propose to address the training data quality issue with multi-perspective reward data instead of refining the ground-truth image quality.
Experiments indicate that our multi-reward conditioned model outperforms its no-reward counterpart on two popular editing pipelines.
arXiv Detail & Related papers (2024-11-06T05:02:29Z) - Achieving Complex Image Edits via Function Aggregation with Diffusion Models [15.509233098264513]
Diffusion models have demonstrated strong performance in generative tasks, making them ideal candidates for image editing.
We introduce FunEditor, an efficient diffusion model designed to learn atomic editing functions and perform complex edits by aggregating simpler functions.
FunEditor is 5 to 24 times faster inference than existing methods on complex tasks like object movement.
arXiv Detail & Related papers (2024-08-16T02:33:55Z) - InstructRL4Pix: Training Diffusion for Image Editing by Reinforcement Learning [31.799923647356458]
We propose Reinforcement Learning Guided Image Editing Method(InstructRL4Pix) to train a diffusion model to generate images that are guided by the attention maps of the target object.
Experimental results show that InstructRL4Pix breaks through the limitations of traditional datasets and uses unsupervised learning to optimize editing goals and achieve accurate image editing based on natural human commands.
arXiv Detail & Related papers (2024-06-14T12:31:48Z) - SmartEdit: Exploring Complex Instruction-based Image Editing with
Multimodal Large Language Models [91.22477798288003]
This paper introduces SmartEdit, a novel approach to instruction-based image editing.
It exploits Multimodal Large Language Models (MLLMs) to enhance their understanding and reasoning capabilities.
We show that a small amount of complex instruction editing data can effectively stimulate SmartEdit's editing capabilities for more complex instructions.
arXiv Detail & Related papers (2023-12-11T17:54:11Z) - Emu Edit: Precise Image Editing via Recognition and Generation Tasks [62.95717180730946]
We present Emu Edit, a multi-task image editing model which sets state-of-the-art results in instruction-based image editing.
We train it to multi-task across an unprecedented range of tasks, such as region-based editing, free-form editing, and Computer Vision tasks.
We show that Emu Edit can generalize to new tasks, such as image inpainting, super-resolution, and compositions of editing tasks, with just a few labeled examples.
arXiv Detail & Related papers (2023-11-16T18:55:58Z) - Learning to Follow Object-Centric Image Editing Instructions Faithfully [26.69032113274608]
Current approaches focusing on image editing with natural language instructions rely on automatically generated paired data.
We significantly improve the quality of the paired data and enhance the supervision signal.
Our model is capable of performing fine-grained object-centric edits better than state-of-the-art baselines.
arXiv Detail & Related papers (2023-10-29T20:39:11Z) - StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing [86.92711729969488]
We exploit the amazing capacities of pretrained diffusion models for the editing of images.
They either finetune the model, or invert the image in the latent space of the pretrained model.
They suffer from two problems: Unsatisfying results for selected regions, and unexpected changes in nonselected regions.
arXiv Detail & Related papers (2023-03-28T00:16:45Z) - InstructPix2Pix: Learning to Follow Image Editing Instructions [103.77092910685764]
We propose a method for editing images from human instructions.
given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image.
We show compelling editing results for a diverse collection of input images and written instructions.
arXiv Detail & Related papers (2022-11-17T18:58:43Z) - Learning by Planning: Language-Guided Global Image Editing [53.72807421111136]
We develop a text-to-operation model to map the vague editing language request into a series of editing operations.
The only supervision in the task is the target image, which is insufficient for a stable training of sequential decisions.
We propose a novel operation planning algorithm to generate possible editing sequences from the target image as pseudo ground truth.
arXiv Detail & Related papers (2021-06-24T16:30:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.