Related papers: Stable Score Distillation

Stable Score Distillation

URL: http://arxiv.org/abs/2507.09168v1
Date: Sat, 12 Jul 2025 07:14:00 GMT
Title: Stable Score Distillation
Authors: Haiming Zhu, Yangyang Xu, Chenshu Xu, Tingrui Shen, Wenxi Liu, Yong Du, Jun Yu, Shengfeng He,
Abstract summary: We introduce Stable Score Distillation (SSD), a streamlined framework that enhances stability and alignment in the editing process.<n>Our method achieves state-of-the-art results in 2D and 3D editing tasks, including NeRF and text-driven style edits, with faster convergence and reduced complexity.
Score: 45.48460025487433
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-guided image and 3D editing have advanced with diffusion-based models, yet methods like Delta Denoising Score often struggle with stability, spatial control, and editing strength. These limitations stem from reliance on complex auxiliary structures, which introduce conflicting optimization signals and restrict precise, localized edits. We introduce Stable Score Distillation (SSD), a streamlined framework that enhances stability and alignment in the editing process by anchoring a single classifier to the source prompt. Specifically, SSD utilizes Classifier-Free Guidance (CFG) equation to achieves cross-prompt alignment, and introduces a constant term null-text branch to stabilize the optimization process. This approach preserves the original content's structure and ensures that editing trajectories are closely aligned with the source prompt, enabling smooth, prompt-specific modifications while maintaining coherence in surrounding regions. Additionally, SSD incorporates a prompt enhancement branch to boost editing strength, particularly for style transformations. Our method achieves state-of-the-art results in 2D and 3D editing tasks, including NeRF and text-driven style edits, with faster convergence and reduced complexity, providing a robust and efficient solution for text-guided editing.

Related papers

FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing [2.7248421583285265]
FlowDirector is a novel inversion-free video editing framework.<n>Our framework models the editing process as a direct evolution in data space.<n>To achieve localized and controllable edits, we introduce an attention-guided masking mechanism.
arXiv Detail & Related papers (2025-06-05T13:54:40Z)
Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model [60.82962950960996]
We introduce UnifyEdit, a tuning-free method that performs diffusion latent optimization.<n>We develop two attention-based constraints: a self-attention (SA) preservation constraint for structural fidelity, and a cross-attention (CA) alignment constraint to enhance text alignment.<n>Our approach achieves a robust balance between structure preservation and text alignment across various editing tasks, outperforming other state-of-the-art methods.
arXiv Detail & Related papers (2025-04-08T01:02:50Z)
DCEdit: Dual-Level Controlled Image Editing via Precisely Localized Semantics [71.78350994830885]
We present a novel approach to improving text-guided image editing using diffusion-based models.<n>Our method uses visual and textual self-attention to enhance the cross-attention map, which can serve as a regional cues to improve editing performance.<n>To fully compare our methods with other DiT-based approaches, we construct the RW-800 benchmark, featuring high resolution images, long descriptive texts, real-world images, and a new text editing task.
arXiv Detail & Related papers (2025-03-21T02:14:03Z)
Lost in Edits? A $λ$-Compass for AIGC Provenance [119.95562081325552]
We propose a novel latent-space attribution method that robustly identifies and differentiates authentic outputs from manipulated ones.<n>LambdaTracer is effective across diverse iterative editing processes, whether automated through text-guided editing tools such as InstructPix2Pix or performed manually with editing software such as Adobe Photoshop.
arXiv Detail & Related papers (2025-02-05T06:24:25Z)
Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing [60.730661748555214]
We introduce textbfTask-textbfOriented textbfDiffusion textbfInversion (textbfTODInv), a novel framework that inverts and edits real images tailored to specific editing tasks. ToDInv seamlessly integrates inversion and editing through reciprocal optimization, ensuring both high fidelity and precise editability.
arXiv Detail & Related papers (2024-08-23T22:16:34Z)
FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing [22.308638156328968]
DDIM latent, crucial for retaining the original image's key features and layout, significantly contribute to limitations. We introduce FlexiEdit, which enhances fidelity to input text prompts by refining DDIM latent. Our approach represents notable progress in image editing, particularly in performing complex non-rigid edits.
arXiv Detail & Related papers (2024-07-25T08:07:40Z)
E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance [13.535394339438428]
Diffusion-based image editing is a composite process of preserving the source image content and generating new content or applying modifications. We propose a zero-shot image editing method, named textbfEnhance textbfEditability for text-based image textbfEditing via textbfCLIP guidance.
arXiv Detail & Related papers (2024-03-15T09:26:48Z)
Latent Inversion with Timestep-aware Sampling for Training-free Non-rigid Editing [56.536695050042546]
We propose a training-free approach for non-rigid editing with Stable Diffusion. Our approach comprises three stages: text optimization, latent inversion, and timestep-aware text injection sampling. We demonstrate the effectiveness of our method in terms of identity preservation, editability, and aesthetic quality.
arXiv Detail & Related papers (2024-02-13T17:08:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.