Related papers: PromptArtisan: Multi-instruction Image Editing in Single Pass with Complete Attention Control

PromptArtisan: Multi-instruction Image Editing in Single Pass with Complete Attention Control

URL: http://arxiv.org/abs/2502.10258v1
Date: Fri, 14 Feb 2025 16:11:57 GMT
Title: PromptArtisan: Multi-instruction Image Editing in Single Pass with Complete Attention Control
Authors: Kunal Swami, Raghu Chittersu, Pranav Adlinge, Rajeev Irny, Shashavali Doodekula, Alok Shukla,
Abstract summary: PromptArtisan is a groundbreaking approach to multi-instruction image editing.<n>It achieves remarkable results in a single pass, eliminating the need for time-consuming iterative refinement.
Score: 1.0079049259808768
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present PromptArtisan, a groundbreaking approach to multi-instruction image editing that achieves remarkable results in a single pass, eliminating the need for time-consuming iterative refinement. Our method empowers users to provide multiple editing instructions, each associated with a specific mask within the image. This flexibility allows for complex edits involving mask intersections or overlaps, enabling the realization of intricate and nuanced image transformations. PromptArtisan leverages a pre-trained InstructPix2Pix model in conjunction with a novel Complete Attention Control Mechanism (CACM). This mechanism ensures precise adherence to user instructions, granting fine-grained control over the editing process. Furthermore, our approach is zero-shot, requiring no additional training, and boasts improved processing complexity compared to traditional iterative methods. By seamlessly integrating multi-instruction capabilities, single-pass efficiency, and complete attention control, PromptArtisan unlocks new possibilities for creative and efficient image editing workflows, catering to both novice and expert users alike.

Related papers

Scale Your Instructions: Enhance the Instruction-Following Fidelity of Unified Image Generation Model by Self-Adaptive Attention Scaling [54.54513714247062]
Recent advancements in unified image generation models, such as OmniGen, have enabled the handling of diverse image generation and editing tasks within a single framework.<n>We found that it suffers from text instruction neglect, especially when the text instruction contains multiple sub-instructions.<n>We propose Self-Adaptive Attention Scaling to dynamically scale the attention activation for each sub-instruction.
arXiv Detail & Related papers (2025-07-22T05:25:38Z)
Multi-turn Consistent Image Editing [23.195620233753957]
We propose a multi-turn image editing framework that enables users to iteratively refine their edits.<n>Our approach leverages flow matching for accurate image inversion and a dual-objective Linear Quadratic Regulators (LQR) for stable sampling.<n>Our framework significantly improves edit success rates and visual fidelity compared to existing methods.
arXiv Detail & Related papers (2025-05-07T11:11:23Z)
Image-Editing Specialists: An RLAIF Approach for Diffusion Models [28.807572302899004]
We present a novel approach to training specialized instruction-based image-editing diffusion models. We introduce an online reinforcement learning framework that aligns the diffusion model with human preferences. Experimental results demonstrate that our models can perform intricate edits in complex scenes, after just 10 training steps.
arXiv Detail & Related papers (2025-04-17T10:46:39Z)
UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency [69.33072075580483]
We propose an unsupervised model for instruction-based image editing that eliminates the need for ground-truth edited images during training.<n>Our method addresses these challenges by introducing a novel editing mechanism called Cycle Edit Consistency ( CEC)<n> CEC applies forward and backward edits in one training step and enforces consistency in image and attention spaces.
arXiv Detail & Related papers (2024-12-19T18:59:58Z)
BrushEdit: All-In-One Image Inpainting and Editing [79.55816192146762]
BrushEdit is a novel inpainting-based instruction-guided image editing paradigm.<n>We devise a system enabling free-form instruction editing by integrating MLLMs and a dual-branch image inpainting model.<n>Our framework effectively combines MLLMs and inpainting models, achieving superior performance across seven metrics.
arXiv Detail & Related papers (2024-12-13T17:58:06Z)
INRetouch: Context Aware Implicit Neural Representation for Photography Retouching [54.17599183365242]
We propose a novel retouch transfer approach that learns from professional edits through before-after image pairs.<n>We develop a context-aware Implicit Neural Representation that learns to apply edits adaptively based on image content and context.<n>Our approach not only surpasses existing methods in photo retouching but also enhances performance in related image reconstruction tasks.
arXiv Detail & Related papers (2024-12-05T03:31:48Z)
Zero-shot Image Editing with Reference Imitation [50.75310094611476]
We present a new form of editing, termed imitative editing, to help users exercise their creativity more conveniently. We propose a generative training framework, dubbed MimicBrush, which randomly selects two frames from a video clip, masks some regions of one frame, and learns to recover the masked regions using the information from the other frame. We experimentally show the effectiveness of our method under various test cases as well as its superiority over existing alternatives.
arXiv Detail & Related papers (2024-06-11T17:59:51Z)
Streamlining Image Editing with Layered Diffusion Brushes [8.738398948669609]
Our system renders a single edit on a 512x512 image within 140 ms using a high-end consumer GPU. Our approach demonstrates efficacy across a range of tasks, including object attribute adjustments, error correction, and sequential prompt-based object placement and manipulation.
arXiv Detail & Related papers (2024-05-01T04:30:03Z)
Unified Diffusion-Based Rigid and Non-Rigid Editing with Text and Image Guidance [15.130419159003816]
We present a versatile image editing framework capable of executing both rigid and non-rigid edits. We leverage a dual-path injection scheme to handle diverse editing scenarios. We introduce an integrated self-attention mechanism for fusion of appearance and structural information.
arXiv Detail & Related papers (2024-01-04T08:21:30Z)
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models [91.22477798288003]
This paper introduces SmartEdit, a novel approach to instruction-based image editing. It exploits Multimodal Large Language Models (MLLMs) to enhance their understanding and reasoning capabilities. We show that a small amount of complex instruction editing data can effectively stimulate SmartEdit's editing capabilities for more complex instructions.
arXiv Detail & Related papers (2023-12-11T17:54:11Z)
Emu Edit: Precise Image Editing via Recognition and Generation Tasks [62.95717180730946]
We present Emu Edit, a multi-task image editing model which sets state-of-the-art results in instruction-based image editing. We train it to multi-task across an unprecedented range of tasks, such as region-based editing, free-form editing, and Computer Vision tasks. We show that Emu Edit can generalize to new tasks, such as image inpainting, super-resolution, and compositions of editing tasks, with just a few labeled examples.
arXiv Detail & Related papers (2023-11-16T18:55:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.