DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Mode
- URL: http://arxiv.org/abs/2503.12838v1
- Date: Mon, 17 Mar 2025 05:34:11 GMT
- Title: DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Mode
- Authors: Junjia Huang, Pengxiang Yan, Jinhang Cai, Jiyang Liu, Zhao Wang, Yitong Wang, Xinglong Wu, Guanbin Li,
- Abstract summary: We introduce DreamLayer, a framework that enables coherent text-driven generation of multiple image layers.<n>By explicitly modeling the relationship between transparent foreground and background layers, DreamLayer builds inter-layer connections.<n>Experiments and user studies demonstrate that DreamLayer generates more coherent and well-aligned layers.
- Score: 47.32061459437175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-driven image generation using diffusion models has recently gained significant attention. To enable more flexible image manipulation and editing, recent research has expanded from single image generation to transparent layer generation and multi-layer compositions. However, existing approaches often fail to provide a thorough exploration of multi-layer structures, leading to inconsistent inter-layer interactions, such as occlusion relationships, spatial layout, and shadowing. In this paper, we introduce DreamLayer, a novel framework that enables coherent text-driven generation of multiple image layers, by explicitly modeling the relationship between transparent foreground and background layers. DreamLayer incorporates three key components, i.e., Context-Aware Cross-Attention (CACA) for global-local information exchange, Layer-Shared Self-Attention (LSSA) for establishing robust inter-layer connections, and Information Retained Harmonization (IRH) for refining fusion details at the latent level. By leveraging a coherent full-image context, DreamLayer builds inter-layer connections through attention mechanisms and applies a harmonization step to achieve seamless layer fusion. To facilitate research in multi-layer generation, we construct a high-quality, diverse multi-layer dataset including 400k samples. Extensive experiments and user studies demonstrate that DreamLayer generates more coherent and well-aligned layers, with broad applicability, including latent-space image editing and image-to-layer decomposition.
Related papers
- ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation [108.69315278353932]
We introduce the Anonymous Region Transformer (ART), which facilitates the direct generation of variable multi-layer transparent images.<n>By enabling precise control and scalable layer generation, ART establishes a new paradigm for interactive content creation.
arXiv Detail & Related papers (2025-02-25T16:57:04Z) - LayeringDiff: Layered Image Synthesis via Generation, then Disassembly with Generative Knowledge [14.481577976493236]
LayeringDiff is a novel pipeline for the synthesis of layered images.<n>By extracting layers from a composite image, rather than generating them from scratch, LayeringDiff bypasses the need for large-scale training.<n>For effective layer decomposition, we adapt a large-scale pretrained generative prior to estimate foreground and background layers.
arXiv Detail & Related papers (2025-01-02T11:18:25Z) - LayerFusion: Harmonized Multi-Layer Text-to-Image Generation with Generative Priors [38.47462111828742]
Layered content generation is crucial for creative fields like graphic design, animation, and digital art.<n>We propose a novel image generation pipeline based on Latent Diffusion Models (LDMs) that generates images with two layers.<n>We show significant improvements in visual coherence, image quality, and layer consistency compared to baseline methods.
arXiv Detail & Related papers (2024-12-05T18:59:18Z) - Hierarchical Multi-modal Transformer for Cross-modal Long Document Classification [74.45521856327001]
How to classify long documents with hierarchical structure texts and embedding images is a new problem.
We propose a novel approach called Hierarchical Multi-modal Transformer (HMT) for cross-modal long document classification.
Our approach uses a multi-modal transformer and a dynamic multi-scale multi-modal transformer to model the complex relationships between image features, and the section and sentence features.
arXiv Detail & Related papers (2024-07-14T07:12:25Z) - LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model [70.14953942532621]
Layer-collaborative diffusion model, named LayerDiff, is designed for text-guided, multi-layered, composable image synthesis.
Our model can generate high-quality multi-layered images with performance comparable to conventional whole-image generation methods.
LayerDiff enables a broader range of controllable generative applications, including layer-specific image editing and style transfer.
arXiv Detail & Related papers (2024-03-18T16:28:28Z) - Consolidating Attention Features for Multi-view Image Editing [126.19731971010475]
We focus on spatial control-based geometric manipulations and introduce a method to consolidate the editing process across various views.
We introduce QNeRF, a neural radiance field trained on the internal query features of the edited images.
We refine the process through a progressive, iterative method that better consolidates queries across the diffusion timesteps.
arXiv Detail & Related papers (2024-02-22T18:50:18Z) - SLIDE: Single Image 3D Photography with Soft Layering and Depth-aware
Inpainting [54.419266357283966]
Single image 3D photography enables viewers to view a still image from novel viewpoints.
Recent approaches combine monocular depth networks with inpainting networks to achieve compelling results.
We present SLIDE, a modular and unified system for single image 3D photography.
arXiv Detail & Related papers (2021-09-02T16:37:20Z) - Diversifying Semantic Image Synthesis and Editing via Class- and
Layer-wise VAEs [8.528384027684192]
We propose a class- and layer-wise extension to the variational autoencoder framework that allows flexible control over each object class at the local to global levels.
We demonstrate that our method generates images that are both plausible and more diverse compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-06-25T04:12:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.