MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues
- URL: http://arxiv.org/abs/2512.03046v1
- Date: Tue, 02 Dec 2025 18:59:58 GMT
- Title: MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues
- Authors: Zichen Liu, Yue Yu, Hao Ouyang, Qiuyu Wang, Shuailei Ma, Ka Leong Cheng, Wen Wang, Qingyan Bai, Yuxuan Zhang, Yanhong Zeng, Yixuan Li, Xing Zhu, Yujun Shen, Qifeng Chen,
- Abstract summary: We propose MagicQuill V2, a novel system that introduces a textbflayered composition paradigm to generative image editing.<n>Our method deconstructs creative intent into a stack of controllable visual cues.
- Score: 106.02577891104079
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose MagicQuill V2, a novel system that introduces a \textbf{layered composition} paradigm to generative image editing, bridging the gap between the semantic power of diffusion models and the granular control of traditional graphics software. While diffusion transformers excel at holistic generation, their use of singular, monolithic prompts fails to disentangle distinct user intentions for content, position, and appearance. To overcome this, our method deconstructs creative intent into a stack of controllable visual cues: a content layer for what to create, a spatial layer for where to place it, a structural layer for how it is shaped, and a color layer for its palette. Our technical contributions include a specialized data generation pipeline for context-aware content integration, a unified control module to process all visual cues, and a fine-tuned spatial branch for precise local editing, including object removal. Extensive experiments validate that this layered approach effectively resolves the user intention gap, granting creators direct, intuitive control over the generative process.
Related papers
- ReMix: Towards a Unified View of Consistent Character Generation and Editing [22.04681457337335]
ReMix is a unified framework for character-consistent generation and editing.<n>It constitutes two core components: the ReMix Module and IP-ControlNet.<n>ReMix supports a wide range of tasks, including personalized generation, image editing, style transfer, and multi-condition synthesis.
arXiv Detail & Related papers (2025-10-11T10:31:56Z) - IntrinsicEdit: Precise generative image manipulation in intrinsic space [53.404235331886255]
We introduce a versatile, generative workflow that operates in an intrinsic-image latent space.<n>We address key challenges of identity preservation and intrinsic-channel entanglement.<n>We enable precise, efficient editing with automatic resolution of global illumination effects.
arXiv Detail & Related papers (2025-05-13T18:24:15Z) - ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation [108.69315278353932]
We introduce the Anonymous Region Transformer (ART), which facilitates the direct generation of variable multi-layer transparent images.<n>By enabling precise control and scalable layer generation, ART establishes a new paradigm for interactive content creation.
arXiv Detail & Related papers (2025-02-25T16:57:04Z) - Generative Image Layer Decomposition with Visual Effects [49.75021036203426]
LayerDecomp is a generative framework for image layer decomposition.<n>It produces clean backgrounds and high-quality transparent foregrounds with faithfully preserved visual effects.<n>Our method achieves superior quality in layer decomposition, outperforming existing approaches in object removal and spatial editing tasks.
arXiv Detail & Related papers (2024-11-26T20:26:49Z) - Move Anything with Layered Scene Diffusion [77.45870343845492]
We propose SceneDiffusion to optimize a layered scene representation during the diffusion sampling process.
Our key insight is that spatial disentanglement can be obtained by jointly denoising scene renderings at different spatial layouts.
Our generated scenes support a wide range of spatial editing operations, including moving, resizing, cloning, and layer-wise appearance editing operations.
arXiv Detail & Related papers (2024-04-10T17:28:16Z) - DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing [22.855660721387167]
We transform the spatial-aware image editing task into a combination of two sub-tasks: multi-layered latent decomposition and multi-layered latent fusion.
We show that our approach consistently surpasses the latest spatial editing methods, including Self-Guidance and DiffEditor.
arXiv Detail & Related papers (2024-03-21T15:35:42Z) - Layered Rendering Diffusion Model for Controllable Zero-Shot Image Synthesis [15.76266032768078]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries.<n>We first introduce vision guidance as a foundational spatial cue within the perturbed distribution.<n>We propose a universal framework, Layered Rendering Diffusion (LRDiff), which constructs an image-rendering process with multiple layers.
arXiv Detail & Related papers (2023-11-30T10:36:19Z) - LayerDiffusion: Layered Controlled Image Editing with Diffusion Models [5.58892860792971]
LayerDiffusion is a semantic-based layered controlled image editing method.
We leverage a large-scale text-to-image model and employ a layered controlled optimization strategy.
Experimental results demonstrate the effectiveness of our method in generating highly coherent images.
arXiv Detail & Related papers (2023-05-30T01:26:41Z) - Localizing Object-level Shape Variations with Text-to-Image Diffusion
Models [60.422435066544814]
We present a technique to generate a collection of images that depicts variations in the shape of a specific object.
A particular challenge when generating object variations is accurately localizing the manipulation applied over the object's shape.
To localize the image-space operation, we present two techniques that use the self-attention layers in conjunction with the cross-attention layers.
arXiv Detail & Related papers (2023-03-20T17:45:08Z) - Collage Diffusion [17.660410448312717]
Collage Diffusion harmonizes the input layers to make objects fit together.
We preserve key visual attributes of input layers by learning specialized text representations per layer.
Collage Diffusion generates globally harmonized images that maintain desired object characteristics better than prior approaches.
arXiv Detail & Related papers (2023-03-01T06:35:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.