LUSD: Localized Update Score Distillation for Text-Guided Image Editing
- URL: http://arxiv.org/abs/2503.11054v1
- Date: Fri, 14 Mar 2025 03:45:29 GMT
- Title: LUSD: Localized Update Score Distillation for Text-Guided Image Editing
- Authors: Worameth Chinchuthakun, Tossaporn Saengja, Nontawat Tritrong, Pitchaporn Rewatbowornwong, Pramook Khungurn, Supasorn Suwajanakorn,
- Abstract summary: Recent works have introduced score distillation techniques that leverage the rich generative prior of text-to-image diffusion models.<n>We propose two simple yet effective modifications: attention-based spatial regularization and gradient filtering-normalization.<n> Experimental results show our method outperforms state-of-the-art score distillation techniques in prompt fidelity, improving successful edits while preserving the background.
- Score: 11.293199854940772
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While diffusion models show promising results in image editing given a target prompt, achieving both prompt fidelity and background preservation remains difficult. Recent works have introduced score distillation techniques that leverage the rich generative prior of text-to-image diffusion models to solve this task without additional fine-tuning. However, these methods often struggle with tasks such as object insertion. Our investigation of these failures reveals significant variations in gradient magnitude and spatial distribution, making hyperparameter tuning highly input-specific or unsuccessful. To address this, we propose two simple yet effective modifications: attention-based spatial regularization and gradient filtering-normalization, both aimed at reducing these variations during gradient updates. Experimental results show our method outperforms state-of-the-art score distillation techniques in prompt fidelity, improving successful edits while preserving the background. Users also preferred our method over state-of-the-art techniques across three metrics, and by 58-64% overall.
Related papers
- PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing [63.38854614997581]
We introduce PostEdit, a method that incorporates a posterior scheme to govern the diffusion sampling process.<n>The proposed PostEdit achieves state-of-the-art editing performance while accurately preserving unedited regions.<n>The method is both inversion- and training-free, necessitating approximately 1.5 seconds and 18 GB of GPU memory to generate high-quality results.
arXiv Detail & Related papers (2024-10-07T09:04:50Z) - TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models [53.757752110493215]
We focus on a popular line of text-based editing frameworks - the edit-friendly'' DDPM-noise inversion approach.
We analyze its application to fast sampling methods and categorize its failures into two classes: the appearance of visual artifacts, and insufficient editing strength.
We propose a pseudo-guidance approach that efficiently increases the magnitude of edits without introducing new artifacts.
arXiv Detail & Related papers (2024-08-01T17:27:28Z) - Preserving Identity with Variational Score for General-purpose 3D Editing [48.314327790451856]
Piva is a novel optimization-based method for editing images and 3D models based on diffusion models.
We pinpoint the limitations in 2D and 3D editing, which causes detail loss and oversaturation.
We propose an additional score distillation term that enforces identity preservation.
arXiv Detail & Related papers (2024-06-13T09:32:40Z) - DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - ReNoise: Real Image Inversion Through Iterative Noising [62.96073631599749]
We introduce an inversion method with a high quality-to-operation ratio, enhancing reconstruction accuracy without increasing the number of operations.
We evaluate the performance of our ReNoise technique using various sampling algorithms and models, including recent accelerated diffusion models.
arXiv Detail & Related papers (2024-03-21T17:52:08Z) - Improving Diffusion-based Image Translation using Asymmetric Gradient
Guidance [51.188396199083336]
We present an approach that guides the reverse process of diffusion sampling by applying asymmetric gradient guidance.
Our model's adaptability allows it to be implemented with both image-fusion and latent-dif models.
Experiments show that our method outperforms various state-of-the-art models in image translation tasks.
arXiv Detail & Related papers (2023-06-07T12:56:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.