PSDiffusion: Harmonized Multi-Layer Image Generation via Layout and Appearance Alignment
- URL: http://arxiv.org/abs/2505.11468v2
- Date: Sat, 08 Nov 2025 20:54:08 GMT
- Title: PSDiffusion: Harmonized Multi-Layer Image Generation via Layout and Appearance Alignment
- Authors: Dingbang Huang, Wenbo Li, Yifei Zhao, Xinyu Pan, Chun Wang, Yanhong Zeng, Bo Dai,
- Abstract summary: Transparent image layer generation plays a significant role in digital art and design.<n>Existing methods typically decompose transparent layers from a single RGB image using a set of tools or generate multiple transparent layers sequentially.<n>We propose PSDiffusion, a unified diffusion framework that leverages image composition priors from pre-trained image diffusion model for simultaneous multi-layer text-to-image generation.
- Score: 23.67447416568964
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transparent image layer generation plays a significant role in digital art and design workflows. Existing methods typically decompose transparent layers from a single RGB image using a set of tools or generate multiple transparent layers sequentially. Despite some promising results, these methods often limit their ability to model global layout, physically plausible interactions, and visual effects such as shadows and reflections with high alpha quality due to limited shared global context among layers. To address this issue, we propose PSDiffusion, a unified diffusion framework that leverages image composition priors from pre-trained image diffusion model for simultaneous multi-layer text-to-image generation. Specifically, our method introduces a global layer interaction mechanism to generate layered images collaboratively, ensuring both individual layer quality and coherent spatial and visual relationships across layers. We include extensive experiments on benchmark datasets to demonstrate that PSDiffusion is able to outperform existing methods in generating multi-layer images with plausible structure and enhanced visual fidelity.
Related papers
- Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition [73.43121650616804]
We propose textbfQwen-Image-Layered, an end-to-end diffusion model that decomposes a single RGB image into multiple semantically disentangled RGBA layers.<n>Our method significantly surpasses existing approaches in decomposition quality and establishes a new paradigm for consistent image editing.
arXiv Detail & Related papers (2025-12-17T17:12:42Z) - OmniPSD: Layered PSD Generation with Diffusion Transformer [59.20320950128599]
We propose OmniPSD, a unified diffusion framework built upon the Flux ecosystem.<n>It enables text-to-PSD generation and image-to-PSD decomposition through in-context learning.<n>Experiments on our new RGBA-layered dataset demonstrate that OmniPSD achieves high-fidelity generation.
arXiv Detail & Related papers (2025-12-10T02:09:59Z) - DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Mode [47.32061459437175]
We introduce DreamLayer, a framework that enables coherent text-driven generation of multiple image layers.<n>By explicitly modeling the relationship between transparent foreground and background layers, DreamLayer builds inter-layer connections.<n>Experiments and user studies demonstrate that DreamLayer generates more coherent and well-aligned layers.
arXiv Detail & Related papers (2025-03-17T05:34:11Z) - ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation [108.69315278353932]
We introduce the Anonymous Region Transformer (ART), which facilitates the direct generation of variable multi-layer transparent images.<n>By enabling precise control and scalable layer generation, ART establishes a new paradigm for interactive content creation.
arXiv Detail & Related papers (2025-02-25T16:57:04Z) - LayerFusion: Harmonized Multi-Layer Text-to-Image Generation with Generative Priors [38.47462111828742]
Layered content generation is crucial for creative fields like graphic design, animation, and digital art.<n>We propose a novel image generation pipeline based on Latent Diffusion Models (LDMs) that generates images with two layers.<n>We show significant improvements in visual coherence, image quality, and layer consistency compared to baseline methods.
arXiv Detail & Related papers (2024-12-05T18:59:18Z) - Generative Image Layer Decomposition with Visual Effects [49.75021036203426]
LayerDecomp is a generative framework for image layer decomposition.<n>It produces clean backgrounds and high-quality transparent foregrounds with faithfully preserved visual effects.<n>Our method achieves superior quality in layer decomposition, outperforming existing approaches in object removal and spatial editing tasks.
arXiv Detail & Related papers (2024-11-26T20:26:49Z) - Generating Compositional Scenes via Text-to-image RGBA Instance Generation [82.63805151691024]
Text-to-image diffusion generative models can generate high quality images at the cost of tedious prompt engineering.
We propose a novel multi-stage generation paradigm that is designed for fine-grained control, flexibility and interactivity.
Our experiments show that our RGBA diffusion model is capable of generating diverse and high quality instances with precise control over object attributes.
arXiv Detail & Related papers (2024-11-16T23:44:14Z) - LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model [70.14953942532621]
Layer-collaborative diffusion model, named LayerDiff, is designed for text-guided, multi-layered, composable image synthesis.
Our model can generate high-quality multi-layered images with performance comparable to conventional whole-image generation methods.
LayerDiff enables a broader range of controllable generative applications, including layer-specific image editing and style transfer.
arXiv Detail & Related papers (2024-03-18T16:28:28Z) - Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs [77.86214400258473]
We propose a new training-free text-to-image generation/editing framework, namely Recaption, Plan and Generate (RPG)
RPG harnesses the powerful chain-of-thought reasoning ability of multimodal LLMs to enhance the compositionality of text-to-image diffusion models.
Our framework exhibits wide compatibility with various MLLM architectures.
arXiv Detail & Related papers (2024-01-22T06:16:29Z) - Text2Layer: Layered Image Generation using Latent Diffusion Model [12.902259486204898]
We propose to generate layered images from a layered image generation perspective.
To achieve layered image generation, we train an autoencoder that is able to reconstruct layered images.
Experimental results show that the proposed method is able to generate high-quality layered images.
arXiv Detail & Related papers (2023-07-19T06:56:07Z) - Deep Image Compositing [93.75358242750752]
We propose a new method which can automatically generate high-quality image composites without any user input.
Inspired by Laplacian pyramid blending, a dense-connected multi-stream fusion network is proposed to effectively fuse the information from the foreground and background images.
Experiments show that the proposed method can automatically generate high-quality composites and outperforms existing methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-11-04T06:12:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.