Related papers: OmniPSD: Layered PSD Generation with Diffusion Transformer

OmniPSD: Layered PSD Generation with Diffusion Transformer

URL: http://arxiv.org/abs/2512.09247v1
Date: Wed, 10 Dec 2025 02:09:59 GMT
Title: OmniPSD: Layered PSD Generation with Diffusion Transformer
Authors: Cheng Liu, Yiren Song, Haofan Wang, Mike Zheng Shou,
Abstract summary: We propose OmniPSD, a unified diffusion framework built upon the Flux ecosystem.<n>It enables text-to-PSD generation and image-to-PSD decomposition through in-context learning.<n>Experiments on our new RGBA-layered dataset demonstrate that OmniPSD achieves high-fidelity generation.
Score: 59.20320950128599
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Recent advances in diffusion models have greatly improved image generation and editing, yet generating or reconstructing layered PSD files with transparent alpha channels remains highly challenging. We propose OmniPSD, a unified diffusion framework built upon the Flux ecosystem that enables both text-to-PSD generation and image-to-PSD decomposition through in-context learning. For text-to-PSD generation, OmniPSD arranges multiple target layers spatially into a single canvas and learns their compositional relationships through spatial attention, producing semantically coherent and hierarchically structured layers. For image-to-PSD decomposition, it performs iterative in-context editing, progressively extracting and erasing textual and foreground components to reconstruct editable PSD layers from a single flattened image. An RGBA-VAE is employed as an auxiliary representation module to preserve transparency without affecting structure learning. Extensive experiments on our new RGBA-layered dataset demonstrate that OmniPSD achieves high-fidelity generation, structural consistency, and transparency awareness, offering a new paradigm for layered design generation and decomposition with diffusion transformers.

Related papers

Edge-Aware Image Manipulation via Diffusion Models with a Novel Structure-Preservation Loss [32.26030534230571]
We propose a novel Structure Preservation Loss (SPL) to quantify structural differences between input and edited images.<n>Our training-free approach integrates SPL directly into the diffusion model's generative process to ensure structural fidelity.<n> Experiments confirm SPL enhances structural fidelity, delivering state-of-the-art performance in latent-diffusion-based image editing.
arXiv Detail & Related papers (2026-01-23T11:06:51Z)
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition [73.43121650616804]
We propose textbfQwen-Image-Layered, an end-to-end diffusion model that decomposes a single RGB image into multiple semantically disentangled RGBA layers.<n>Our method significantly surpasses existing approaches in decomposition quality and establishes a new paradigm for consistent image editing.
arXiv Detail & Related papers (2025-12-17T17:12:42Z)
MagicQuillV2: Precise and Interactive Image Editing with Layered Visual Cues [106.02577891104079]
We propose MagicQuill V2, a novel system that introduces a textbflayered composition paradigm to generative image editing.<n>Our method deconstructs creative intent into a stack of controllable visual cues.
arXiv Detail & Related papers (2025-12-02T18:59:58Z)
DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers [85.1185656296496]
We present DiffDecompose, a diffusion Transformer-based framework that learns the posterior over possible layer decompositions conditioned on the input image.<n>The code and dataset will be available upon paper acceptance.
arXiv Detail & Related papers (2025-05-24T16:08:04Z)
PSDiffusion: Harmonized Multi-Layer Image Generation via Layout and Appearance Alignment [23.67447416568964]
Transparent image layer generation plays a significant role in digital art and design.<n>Existing methods typically decompose transparent layers from a single RGB image using a set of tools or generate multiple transparent layers sequentially.<n>We propose PSDiffusion, a unified diffusion framework that leverages image composition priors from pre-trained image diffusion model for simultaneous multi-layer text-to-image generation.
arXiv Detail & Related papers (2025-05-16T17:23:35Z)
DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Mode [47.32061459437175]
We introduce DreamLayer, a framework that enables coherent text-driven generation of multiple image layers.<n>By explicitly modeling the relationship between transparent foreground and background layers, DreamLayer builds inter-layer connections.<n>Experiments and user studies demonstrate that DreamLayer generates more coherent and well-aligned layers.
arXiv Detail & Related papers (2025-03-17T05:34:11Z)
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation [108.69315278353932]
We introduce the Anonymous Region Transformer (ART), which facilitates the direct generation of variable multi-layer transparent images.<n>By enabling precise control and scalable layer generation, ART establishes a new paradigm for interactive content creation.
arXiv Detail & Related papers (2025-02-25T16:57:04Z)
LayeringDiff: Layered Image Synthesis via Generation, then Disassembly with Generative Knowledge [14.481577976493236]
LayeringDiff is a novel pipeline for the synthesis of layered images.<n>By extracting layers from a composite image, rather than generating them from scratch, LayeringDiff bypasses the need for large-scale training.<n>For effective layer decomposition, we adapt a large-scale pretrained generative prior to estimate foreground and background layers.
arXiv Detail & Related papers (2025-01-02T11:18:25Z)
LayerFusion: Harmonized Multi-Layer Text-to-Image Generation with Generative Priors [38.47462111828742]
Layered content generation is crucial for creative fields like graphic design, animation, and digital art.<n>We propose a novel image generation pipeline based on Latent Diffusion Models (LDMs) that generates images with two layers.<n>We show significant improvements in visual coherence, image quality, and layer consistency compared to baseline methods.
arXiv Detail & Related papers (2024-12-05T18:59:18Z)
Generative Image Layer Decomposition with Visual Effects [49.75021036203426]
LayerDecomp is a generative framework for image layer decomposition.<n>It produces clean backgrounds and high-quality transparent foregrounds with faithfully preserved visual effects.<n>Our method achieves superior quality in layer decomposition, outperforming existing approaches in object removal and spatial editing tasks.
arXiv Detail & Related papers (2024-11-26T20:26:49Z)
LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model [70.14953942532621]
Layer-collaborative diffusion model, named LayerDiff, is designed for text-guided, multi-layered, composable image synthesis. Our model can generate high-quality multi-layered images with performance comparable to conventional whole-image generation methods. LayerDiff enables a broader range of controllable generative applications, including layer-specific image editing and style transfer.
arXiv Detail & Related papers (2024-03-18T16:28:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.