OmniPSD: Layered PSD Generation with Diffusion Transformer
- URL: http://arxiv.org/abs/2512.09247v1
- Date: Wed, 10 Dec 2025 02:09:59 GMT
- Title: OmniPSD: Layered PSD Generation with Diffusion Transformer
- Authors: Cheng Liu, Yiren Song, Haofan Wang, Mike Zheng Shou,
- Abstract summary: We propose OmniPSD, a unified diffusion framework built upon the Flux ecosystem.<n>It enables text-to-PSD generation and image-to-PSD decomposition through in-context learning.<n>Experiments on our new RGBA-layered dataset demonstrate that OmniPSD achieves high-fidelity generation.
- Score: 59.20320950128599
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recent advances in diffusion models have greatly improved image generation and editing, yet generating or reconstructing layered PSD files with transparent alpha channels remains highly challenging. We propose OmniPSD, a unified diffusion framework built upon the Flux ecosystem that enables both text-to-PSD generation and image-to-PSD decomposition through in-context learning. For text-to-PSD generation, OmniPSD arranges multiple target layers spatially into a single canvas and learns their compositional relationships through spatial attention, producing semantically coherent and hierarchically structured layers. For image-to-PSD decomposition, it performs iterative in-context editing, progressively extracting and erasing textual and foreground components to reconstruct editable PSD layers from a single flattened image. An RGBA-VAE is employed as an auxiliary representation module to preserve transparency without affecting structure learning. Extensive experiments on our new RGBA-layered dataset demonstrate that OmniPSD achieves high-fidelity generation, structural consistency, and transparency awareness, offering a new paradigm for layered design generation and decomposition with diffusion transformers.
Related papers
- Edge-Aware Image Manipulation via Diffusion Models with a Novel Structure-Preservation Loss [32.26030534230571]
We propose a novel Structure Preservation Loss (SPL) to quantify structural differences between input and edited images.<n>Our training-free approach integrates SPL directly into the diffusion model's generative process to ensure structural fidelity.<n> Experiments confirm SPL enhances structural fidelity, delivering state-of-the-art performance in latent-diffusion-based image editing.
arXiv Detail & Related papers (2026-01-23T11:06:51Z) - Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition [73.43121650616804]
We propose textbfQwen-Image-Layered, an end-to-end diffusion model that decomposes a single RGB image into multiple semantically disentangled RGBA layers.<n>Our method significantly surpasses existing approaches in decomposition quality and establishes a new paradigm for consistent image editing.
arXiv Detail & Related papers (2025-12-17T17:12:42Z) - MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues [106.02577891104079]
We propose MagicQuill V2, a novel system that introduces a textbflayered composition paradigm to generative image editing.<n>Our method deconstructs creative intent into a stack of controllable visual cues.
arXiv Detail & Related papers (2025-12-02T18:59:58Z) - DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers [85.1185656296496]
We present DiffDecompose, a diffusion Transformer-based framework that learns the posterior over possible layer decompositions conditioned on the input image.<n>The code and dataset will be available upon paper acceptance.
arXiv Detail & Related papers (2025-05-24T16:08:04Z) - PSDiffusion: Harmonized Multi-Layer Image Generation via Layout and Appearance Alignment [23.67447416568964]
Transparent image layer generation plays a significant role in digital art and design.<n>Existing methods typically decompose transparent layers from a single RGB image using a set of tools or generate multiple transparent layers sequentially.<n>We propose PSDiffusion, a unified diffusion framework that leverages image composition priors from pre-trained image diffusion model for simultaneous multi-layer text-to-image generation.
arXiv Detail & Related papers (2025-05-16T17:23:35Z) - DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Mode [47.32061459437175]
We introduce DreamLayer, a framework that enables coherent text-driven generation of multiple image layers.<n>By explicitly modeling the relationship between transparent foreground and background layers, DreamLayer builds inter-layer connections.<n>Experiments and user studies demonstrate that DreamLayer generates more coherent and well-aligned layers.
arXiv Detail & Related papers (2025-03-17T05:34:11Z) - ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation [108.69315278353932]
We introduce the Anonymous Region Transformer (ART), which facilitates the direct generation of variable multi-layer transparent images.<n>By enabling precise control and scalable layer generation, ART establishes a new paradigm for interactive content creation.
arXiv Detail & Related papers (2025-02-25T16:57:04Z) - LayeringDiff: Layered Image Synthesis via Generation, then Disassembly with Generative Knowledge [14.481577976493236]
LayeringDiff is a novel pipeline for the synthesis of layered images.<n>By extracting layers from a composite image, rather than generating them from scratch, LayeringDiff bypasses the need for large-scale training.<n>For effective layer decomposition, we adapt a large-scale pretrained generative prior to estimate foreground and background layers.
arXiv Detail & Related papers (2025-01-02T11:18:25Z) - LayerFusion: Harmonized Multi-Layer Text-to-Image Generation with Generative Priors [38.47462111828742]
Layered content generation is crucial for creative fields like graphic design, animation, and digital art.<n>We propose a novel image generation pipeline based on Latent Diffusion Models (LDMs) that generates images with two layers.<n>We show significant improvements in visual coherence, image quality, and layer consistency compared to baseline methods.
arXiv Detail & Related papers (2024-12-05T18:59:18Z) - Generative Image Layer Decomposition with Visual Effects [49.75021036203426]
LayerDecomp is a generative framework for image layer decomposition.<n>It produces clean backgrounds and high-quality transparent foregrounds with faithfully preserved visual effects.<n>Our method achieves superior quality in layer decomposition, outperforming existing approaches in object removal and spatial editing tasks.
arXiv Detail & Related papers (2024-11-26T20:26:49Z) - LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model [70.14953942532621]
Layer-collaborative diffusion model, named LayerDiff, is designed for text-guided, multi-layered, composable image synthesis.
Our model can generate high-quality multi-layered images with performance comparable to conventional whole-image generation methods.
LayerDiff enables a broader range of controllable generative applications, including layer-specific image editing and style transfer.
arXiv Detail & Related papers (2024-03-18T16:28:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.