InstructRL4Pix: Training Diffusion for Image Editing by Reinforcement Learning
- URL: http://arxiv.org/abs/2406.09973v1
- Date: Fri, 14 Jun 2024 12:31:48 GMT
- Title: InstructRL4Pix: Training Diffusion for Image Editing by Reinforcement Learning
- Authors: Tiancheng Li, Jinxiu Liu, Huajun Chen, Qi Liu,
- Abstract summary: We propose Reinforcement Learning Guided Image Editing Method(InstructRL4Pix) to train a diffusion model to generate images that are guided by the attention maps of the target object.
Experimental results show that InstructRL4Pix breaks through the limitations of traditional datasets and uses unsupervised learning to optimize editing goals and achieve accurate image editing based on natural human commands.
- Score: 31.799923647356458
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instruction-based image editing has made a great process in using natural human language to manipulate the visual content of images. However, existing models are limited by the quality of the dataset and cannot accurately localize editing regions in images with complex object relationships. In this paper, we propose Reinforcement Learning Guided Image Editing Method(InstructRL4Pix) to train a diffusion model to generate images that are guided by the attention maps of the target object. Our method maximizes the output of the reward model by calculating the distance between attention maps as a reward function and fine-tuning the diffusion model using proximal policy optimization (PPO). We evaluate our model in object insertion, removal, replacement, and transformation. Experimental results show that InstructRL4Pix breaks through the limitations of traditional datasets and uses unsupervised learning to optimize editing goals and achieve accurate image editing based on natural human commands.
Related papers
- PartEdit: Fine-Grained Image Editing using Pre-Trained Diffusion Models [80.98455219375862]
We present the first text-based image editing approach for object parts based on pre-trained diffusion models.
Our approach is preferred by users 77-90% of the time in conducted user studies.
arXiv Detail & Related papers (2025-02-06T13:08:43Z) - DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - Diffusion Model-Based Image Editing: A Survey [46.244266782108234]
Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks.
We provide an exhaustive overview of existing methods using diffusion models for image editing.
To further evaluate the performance of text-guided image editing algorithms, we propose a systematic benchmark, EditEval.
arXiv Detail & Related papers (2024-02-27T14:07:09Z) - VASE: Object-Centric Appearance and Shape Manipulation of Real Videos [108.60416277357712]
In this work, we introduce a framework that is object-centric and is designed to control both the object's appearance and, notably, to execute precise and explicit structural modifications on the object.
We build our framework on a pre-trained image-conditioned diffusion model, integrate layers to handle the temporal dimension, and propose training strategies and architectural modifications to enable shape control.
We evaluate our method on the image-driven video editing task showing similar performance to the state-of-the-art, and showcasing novel shape-editing capabilities.
arXiv Detail & Related papers (2024-01-04T18:59:24Z) - iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing.
It generates images conditioned on a source image and a textual edit prompt.
It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z) - Fine-grained Image Editing by Pixel-wise Guidance Using Diffusion Models [4.855820180160146]
We propose a novel diffusion-based image editing framework with pixel-wise guidance.
We demonstrate that our proposal outperforms the GAN-based method for editing quality and speed.
arXiv Detail & Related papers (2022-12-05T04:39:08Z) - InstructPix2Pix: Learning to Follow Image Editing Instructions [103.77092910685764]
We propose a method for editing images from human instructions.
given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image.
We show compelling editing results for a diverse collection of input images and written instructions.
arXiv Detail & Related papers (2022-11-17T18:58:43Z) - Look here! A parametric learning based approach to redirect visual
attention [49.609412873346386]
We introduce an automatic method to make an image region more attention-capturing via subtle image edits.
Our model predicts a distinct set of global parametric transformations to be applied to the foreground and background image regions.
Our edits enable inference at interactive rates on any image size, and easily generalize to videos.
arXiv Detail & Related papers (2020-08-12T16:08:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.