Related papers: DreamSteerer: Enhancing Source Image Conditioned Editability using Personalized Diffusion Models

DreamSteerer: Enhancing Source Image Conditioned Editability using Personalized Diffusion Models

URL: http://arxiv.org/abs/2410.11208v2
Date: Wed, 30 Oct 2024 01:16:45 GMT
Title: DreamSteerer: Enhancing Source Image Conditioned Editability using Personalized Diffusion Models
Authors: Zhengyang Yu, Zhaoyuan Yang, Jing Zhang,
Abstract summary: Recent text-to-image personalization methods have shown great promise in teaching a diffusion model user-specified concepts. A promising extension is personalized editing, namely to edit an image using personalized concepts. We propose DreamSteerer, a plug-in method for augmenting existing T2I personalization methods.
Score: 7.418186319496487
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent text-to-image personalization methods have shown great promise in teaching a diffusion model user-specified concepts given a few images for reusing the acquired concepts in a novel context. With massive efforts being dedicated to personalized generation, a promising extension is personalized editing, namely to edit an image using personalized concepts, which can provide a more precise guidance signal than traditional textual guidance. To address this, a straightforward solution is to incorporate a personalized diffusion model with a text-driven editing framework. However, such a solution often shows unsatisfactory editability on the source image. To address this, we propose DreamSteerer, a plug-in method for augmenting existing T2I personalization methods. Specifically, we enhance the source image conditioned editability of a personalized diffusion model via a novel Editability Driven Score Distillation (EDSD) objective. Moreover, we identify a mode trapping issue with EDSD, and propose a mode shifting regularization with spatial feature guided sampling to avoid such an issue. We further employ two key modifications to the Delta Denoising Score framework that enable high-fidelity local editing with personalized concepts. Extensive experiments validate that DreamSteerer can significantly improve the editability of several T2I personalization baselines while being computationally efficient.

Related papers

Image-Editing Specialists: An RLAIF Approach for Diffusion Models [28.807572302899004]
We present a novel approach to training specialized instruction-based image-editing diffusion models. We introduce an online reinforcement learning framework that aligns the diffusion model with human preferences. Experimental results demonstrate that our models can perform intricate edits in complex scenes, after just 10 training steps.
arXiv Detail & Related papers (2025-04-17T10:46:39Z)
Personalize Anything for Free with Diffusion Transformer [20.385520869825413]
Recent training-free approaches struggle with identity preservation, applicability, and compatibility with diffusion transformers (DiTs) We uncover the untapped potential of DiT, where simply replacing denoising tokens with those of a reference subject achieves zero-shot subject reconstruction. We propose textbfPersonalize Anything, a training-free framework that achieves personalized image generation in DiT through: 1) timestep-adaptive token replacement that enforces subject consistency via early-stage injection and enhances flexibility through late-stage regularization, and 2) patch perturbation strategies to boost structural diversity.
arXiv Detail & Related papers (2025-03-16T17:51:16Z)
PartEdit: Fine-Grained Image Editing using Pre-Trained Diffusion Models [80.98455219375862]
We present the first text-based image editing approach for object parts based on pre-trained diffusion models. Our approach is preferred by users 77-90% of the time in conducted user studies.
arXiv Detail & Related papers (2025-02-06T13:08:43Z)
PIXELS: Progressive Image Xemplar-based Editing with Latent Surgery [10.594261300488546]
We introduce a novel framework for progressive exemplar-driven editing with off-the-shelf diffusion models, dubbed PIXELS. PIXELS provides granular control over edits, allowing adjustments at the pixel or region level. We demonstrate that PIXELS delivers high-quality edits efficiently, leading to a notable improvement in quantitative metrics as well as human evaluation.
arXiv Detail & Related papers (2025-01-16T20:26:30Z)
Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning [40.06403155373455]
We propose a novel reinforcement learning framework for personalized text-to-image generation. Our proposed approach outperforms existing state-of-the-art methods by a large margin on visual fidelity while maintaining text-alignment.
arXiv Detail & Related papers (2024-07-09T08:11:53Z)
JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation [49.997839600988875]
Existing personalization methods rely on finetuning a text-to-image foundation model on a user's custom dataset. We propose Joint-Image Diffusion (jedi), an effective technique for learning a finetuning-free personalization model. Our model achieves state-of-the-art generation quality, both quantitatively and qualitatively, significantly outperforming both the prior finetuning-based and finetuning-free personalization baselines.
arXiv Detail & Related papers (2024-07-08T17:59:02Z)
A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models [117.77807994397784]
Image editing aims to edit the given synthetic or real image to meet the specific requirements from users. Recent significant advancement in this field is based on the development of text-to-image (T2I) diffusion models. T2I-based image editing methods significantly enhance editing performance and offer a user-friendly interface for modifying content guided by multimodal inputs.
arXiv Detail & Related papers (2024-06-20T17:58:52Z)
Preserving Identity with Variational Score for General-purpose 3D Editing [48.314327790451856]
Piva is a novel optimization-based method for editing images and 3D models based on diffusion models. We pinpoint the limitations in 2D and 3D editing, which causes detail loss and oversaturation. We propose an additional score distillation term that enforces identity preservation.
arXiv Detail & Related papers (2024-06-13T09:32:40Z)
Editing Massive Concepts in Text-to-Image Diffusion Models [58.620118104364174]
We propose a two-stage method, Editing Massive Concepts In Diffusion Models (EMCID) The first stage performs memory optimization for each individual concept with dual self-distillation from text alignment loss and diffusion noise prediction loss. The second stage conducts massive concept editing with multi-layer, closed form model editing.
arXiv Detail & Related papers (2024-03-20T17:59:57Z)
DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models [53.17454737232668]
We introduce a solution that allows a pretrained T2I diffusion model to learn a set of soft prompts. These prompts offer text-guided editing capabilities and additional flexibility in controlling variation and mixing between multiple distributions. We also show the adaptability of the learned prompt distribution to other tasks, such as text-to-3D.
arXiv Detail & Related papers (2023-12-21T12:11:00Z)
AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing [24.9487669818162]
We propose atemporal guided adaptive editing algorithm AdapEdit, which realizes adaptive image editing. Our approach has a significant advantage in preserving model priors and does not require model training, fine-tuning extra data, or optimization. We present our results over a wide variety of raw images and editing instructions, demonstrating competitive performance and showing it significantly outperforms the previous approaches.
arXiv Detail & Related papers (2023-12-13T09:45:58Z)
Custom-Edit: Text-Guided Image Editing with Customized Diffusion Models [26.92450293675906]
Text-to-image diffusion models can generate diverse, high-fidelity images based on user-provided text prompts. We propose Custom-Edit, in which we (i) customize a diffusion model with a few reference images and then (ii) perform text-guided editing.
arXiv Detail & Related papers (2023-05-25T06:46:28Z)
ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation [8.803251014279502]
Large-scale text-to-image models have demonstrated amazing ability to synthesize diverse and high-fidelity images. Current models can impose significant changes to the original image content during the editing process. We propose ReGeneration learning in an image-to-image Diffusion model (ReDiffuser)
arXiv Detail & Related papers (2023-05-08T12:08:12Z)
Zero-shot Image-to-Image Translation [57.46189236379433]
We propose pix2pix-zero, an image-to-image translation method that can preserve the original image without manual prompting. We propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process. Our method does not need additional training for these edits and can directly use the existing text-to-image diffusion model.
arXiv Detail & Related papers (2023-02-06T18:59:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.