Related papers: LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning

URL: http://arxiv.org/abs/2511.08251v1
Date: Wed, 12 Nov 2025 01:48:40 GMT
Title: LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning
Authors: Fengyi Fu, Mengqi Huang, Lei Zhang, Zhendong Mao,
Abstract summary: We propose a training-free multi-layer disentangled editing framework, LayerEdit.<n>It enables conflict-free object-layered editing through precise object-layered decomposition and coherent fusion.<n>Experiments verify the superiority of LayerEdit over existing methods, showing unprecedented intra-object controllability and inter-object coherence.
Score: 34.08955594341648
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-driven multi-object image editing which aims to precisely modify multiple objects within an image based on text descriptions, has recently attracted considerable interest. Existing works primarily follow the localize-editing paradigm, focusing on independent object localization and editing while neglecting critical inter-object interactions. However, this work points out that the neglected attention entanglements in inter-object conflict regions, inherently hinder disentangled multi-object editing, leading to either inter-object editing leakage or intra-object editing constraints. We thereby propose a novel multi-layer disentangled editing framework LayerEdit, a training-free method which, for the first time, through precise object-layered decomposition and coherent fusion, enables conflict-free object-layered editing. Specifically, LayerEdit introduces a novel "decompose-editingfusion" framework, consisting of: (1) Conflict-aware Layer Decomposition module, which utilizes an attention-aware IoU scheme and time-dependent region removing, to enhance conflict awareness and suppression for layer decomposition. (2) Object-layered Editing module, to establish coordinated intra-layer text guidance and cross-layer geometric mapping, achieving disentangled semantic and structural modifications. (3) Transparency-guided Layer Fusion module, to facilitate structure-coherent inter-object layer fusion through precise transparency guidance learning. Extensive experiments verify the superiority of LayerEdit over existing methods, showing unprecedented intra-object controllability and inter-object coherence in complex multi-object scenarios. Codes are available at: https://github.com/fufy1024/LayerEdit.

Related papers

InterCoG: Towards Spatially Precise Image Editing with Interleaved Chain-of-Grounding Reasoning [60.799998743918955]
We propose a novel text-vision Interleaved Chain-of-Grounding reasoning framework for fine-grained image editing in complex real-world scenes.<n>The key insight of InterCoG is to first perform object position reasoning solely within text.<n>We also propose two auxiliary training modules: multimodal grounding reconstruction supervision and multimodal grounding reasoning alignment.
arXiv Detail & Related papers (2026-03-02T08:13:16Z)
FlowDC: Flow-Based Decoupling-Decay for Complex Image Editing [52.54102743380658]
We propose FlowDC, which decouples the complex editing into multiple sub-editing effects and superposes them in parallel during the editing process.<n>FlowDC shows superior results compared with existing methods.
arXiv Detail & Related papers (2025-12-12T09:08:39Z)
MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues [106.02577891104079]
We propose MagicQuill V2, a novel system that introduces a textbflayered composition paradigm to generative image editing.<n>Our method deconstructs creative intent into a stack of controllable visual cues.
arXiv Detail & Related papers (2025-12-02T18:59:58Z)
O-DisCo-Edit: Object Distortion Control for Unified Realistic Video Editing [88.93410369258203]
O-DisCo-Edit is a unified framework that incorporates a novel object distortion control (O-DisCo)<n>This signal, based on random and adaptive noise, flexibly encapsulates a wide range of editing cues within a single representation.<n>O-DisCo-Edit enables efficient, high-fidelity editing through an effective training paradigm.
arXiv Detail & Related papers (2025-09-01T16:29:39Z)
Image Editing As Programs with Diffusion Models [69.05164729625052]
We introduce Image Editing As Programs (IEAP), a unified image editing framework built upon the Diffusion Transformer (DiT) architecture.<n>IEAP approaches instructional editing through a reductionist lens, decomposing complex editing instructions into sequences of atomic operations.<n>Our framework delivers superior accuracy and semantic fidelity, particularly for complex, multi-step instructions.
arXiv Detail & Related papers (2025-06-04T16:57:24Z)
MDE-Edit: Masked Dual-Editing for Multi-Object Image Editing via Diffusion Models [10.798205956644317]
We propose a training-free, inference-stage optimization approach that enables precise localized image manipulation in complex multi-object scenes, named MDE-Edit.<n>Extensive experiments demonstrate that MDE-Edit outperforms state-of-the-art methods in editing accuracy and visual quality, offering a robust solution for complex multi-object image manipulation tasks.
arXiv Detail & Related papers (2025-05-08T10:01:14Z)
InteractEdit: Zero-Shot Editing of Human-Object Interactions in Images [42.8116807595149]
We present InteractEdit, a novel framework for zero-shot Human-Object Interaction (HOI) editing.<n>It transforms an existing interaction in an image into a new, desired interaction while preserving the identities of the subject and object.<n>Our experiments show that InteractEdit significantly outperforms existing methods.
arXiv Detail & Related papers (2025-03-12T07:40:45Z)
BrushEdit: All-In-One Image Inpainting and Editing [76.93556996538398]
BrushEdit is a novel inpainting-based instruction-guided image editing paradigm.<n>We devise a system enabling free-form instruction editing by integrating MLLMs and a dual-branch image inpainting model.<n>Our framework effectively combines MLLMs and inpainting models, achieving superior performance across seven metrics.
arXiv Detail & Related papers (2024-12-13T17:58:06Z)
DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing [22.855660721387167]
We transform the spatial-aware image editing task into a combination of two sub-tasks: multi-layered latent decomposition and multi-layered latent fusion. We show that our approach consistently surpasses the latest spatial editing methods, including Self-Guidance and DiffEditor.
arXiv Detail & Related papers (2024-03-21T15:35:42Z)
LoMOE: Localized Multi-Object Editing via Multi-Diffusion [8.90467024388923]
We introduce a novel framework for zero-shot localized multi-object editing through a multi-diffusion process. Our approach leverages foreground masks and corresponding simple text prompts that exert localized influences on the target regions. A combination of cross-attention and background losses within the latent space ensures that the characteristics of the object being edited are preserved.
arXiv Detail & Related papers (2024-03-01T10:46:47Z)
LayerDiffusion: Layered Controlled Image Editing with Diffusion Models [5.58892860792971]
LayerDiffusion is a semantic-based layered controlled image editing method. We leverage a large-scale text-to-image model and employ a layered controlled optimization strategy. Experimental results demonstrate the effectiveness of our method in generating highly coherent images.
arXiv Detail & Related papers (2023-05-30T01:26:41Z)
PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor [135.17302411419834]
PAIR Diffusion is a generic framework that enables a diffusion model to control the structure and appearance of each object in the image. We show that having control over the properties of each object in an image leads to comprehensive editing capabilities. Our framework allows for various object-level editing operations on real images such as reference image-based appearance editing, free-form shape editing, adding objects, and variations.
arXiv Detail & Related papers (2023-03-30T17:13:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.