Iterative Multi-granular Image Editing using Diffusion Models
- URL: http://arxiv.org/abs/2309.00613v2
- Date: Sat, 28 Oct 2023 11:16:53 GMT
- Title: Iterative Multi-granular Image Editing using Diffusion Models
- Authors: K J Joseph, Prateksha Udhayanan, Tripti Shukla, Aishwarya Agarwal,
Srikrishna Karanam, Koustava Goswami, Balaji Vasan Srinivasan
- Abstract summary: We propose EMILIE: Iterative Multi-granular Image Editor.
We introduce a new benchmark dataset to evaluate our newly proposed setting.
- Score: 20.21694969555533
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in text-guided image synthesis has dramatically changed how
creative professionals generate artistic and aesthetically pleasing visual
assets. To fully support such creative endeavors, the process should possess
the ability to: 1) iteratively edit the generations and 2) control the spatial
reach of desired changes (global, local or anything in between). We formalize
this pragmatic problem setting as Iterative Multi-granular Editing. While there
has been substantial progress with diffusion-based models for image synthesis
and editing, they are all one shot (i.e., no iterative editing capabilities)
and do not naturally yield multi-granular control (i.e., covering the full
spectrum of local-to-global edits). To overcome these drawbacks, we propose
EMILIE: Iterative Multi-granular Image Editor. EMILIE introduces a novel latent
iteration strategy, which re-purposes a pre-trained diffusion model to
facilitate iterative editing. This is complemented by a gradient control
operation for multi-granular control. We introduce a new benchmark dataset to
evaluate our newly proposed setting. We conduct exhaustive quantitatively and
qualitatively evaluation against recent state-of-the-art approaches adapted to
our task, to being out the mettle of EMILIE. We hope our work would attract
attention to this newly identified, pragmatic problem setting.
Related papers
- PIXELS: Progressive Image Xemplar-based Editing with Latent Surgery [10.594261300488546]
We introduce a novel framework for progressive exemplar-driven editing with off-the-shelf diffusion models, dubbed PIXELS.
PIXELS provides granular control over edits, allowing adjustments at the pixel or region level.
We demonstrate that PIXELS delivers high-quality edits efficiently, leading to a notable improvement in quantitative metrics as well as human evaluation.
arXiv Detail & Related papers (2025-01-16T20:26:30Z) - EditAR: Unified Conditional Generation with Autoregressive Models [58.093860528672735]
We propose EditAR, a single unified autoregressive framework for a variety of conditional image generation tasks.
The model takes both images and instructions as inputs, and predicts the edited images tokens in a vanilla next-token paradigm.
We evaluate its effectiveness across diverse tasks on established benchmarks, showing competitive performance to various state-of-the-art task-specific methods.
arXiv Detail & Related papers (2025-01-08T18:59:35Z) - Uniform Attention Maps: Boosting Image Fidelity in Reconstruction and Editing [66.48853049746123]
We analyze reconstruction from a structural perspective and propose a novel approach that replaces traditional cross-attention with uniform attention maps.
Our method effectively minimizes distortions caused by varying text conditions during noise prediction.
Experimental results demonstrate that our approach not only excels in achieving high-fidelity image reconstruction but also performs robustly in real image composition and editing scenarios.
arXiv Detail & Related papers (2024-11-29T12:11:28Z) - Stable Flow: Vital Layers for Training-Free Image Editing [74.52248787189302]
Diffusion models have revolutionized the field of content synthesis and editing.
Recent models have replaced the traditional UNet architecture with the Diffusion Transformer (DiT)
We propose an automatic method to identify "vital layers" within DiT, crucial for image formation.
Next, to enable real-image editing, we introduce an improved image inversion method for flow models.
arXiv Detail & Related papers (2024-11-21T18:59:51Z) - A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models [117.77807994397784]
Image editing aims to edit the given synthetic or real image to meet the specific requirements from users.
Recent significant advancement in this field is based on the development of text-to-image (T2I) diffusion models.
T2I-based image editing methods significantly enhance editing performance and offer a user-friendly interface for modifying content guided by multimodal inputs.
arXiv Detail & Related papers (2024-06-20T17:58:52Z) - Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing [2.5602836891933074]
A commonly adopted strategy for editing real images involves inverting the diffusion process to obtain a noisy representation of the original image.
Current methods for diffusion inversion often struggle to produce edits that are both faithful to the specified text prompt and closely resemble the source image.
We introduce a novel and adaptable diffusion inversion technique for real image editing, which is grounded in a theoretical analysis of the role of $eta$ in the DDIM sampling equation for enhanced editability.
arXiv Detail & Related papers (2024-03-14T15:07:36Z) - Diffusion Model-Based Image Editing: A Survey [46.244266782108234]
Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks.
We provide an exhaustive overview of existing methods using diffusion models for image editing.
To further evaluate the performance of text-guided image editing algorithms, we propose a systematic benchmark, EditEval.
arXiv Detail & Related papers (2024-02-27T14:07:09Z) - Differential Diffusion: Giving Each Pixel Its Strength [10.36919027402249]
This paper introduces a novel framework that enables customization of the amount of change per pixel or per image region.
Our framework can be integrated into any existing diffusion model, enhancing it with this capability.
We demonstrate our method with the current open state-of-the-art models, and validate it via both quantitative and qualitative comparisons.
arXiv Detail & Related papers (2023-06-01T17:47:06Z) - End-to-End Visual Editing with a Generatively Pre-Trained Artist [78.5922562526874]
We consider the targeted image editing problem: blending a region in a source image with a driver image that specifies the desired change.
We propose a self-supervised approach that simulates edits by augmenting off-the-shelf images in a target domain.
We show that different blending effects can be learned by an intuitive control of the augmentation process, with no other changes required to the model architecture.
arXiv Detail & Related papers (2022-05-03T17:59:30Z) - Learning by Planning: Language-Guided Global Image Editing [53.72807421111136]
We develop a text-to-operation model to map the vague editing language request into a series of editing operations.
The only supervision in the task is the target image, which is insufficient for a stable training of sequential decisions.
We propose a novel operation planning algorithm to generate possible editing sequences from the target image as pseudo ground truth.
arXiv Detail & Related papers (2021-06-24T16:30:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.