PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models
- URL: http://arxiv.org/abs/2505.22523v1
- Date: Wed, 28 May 2025 16:09:33 GMT
- Title: PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models
- Authors: Junwen Chen, Heyang Jiang, Yanbin Wang, Keming Wu, Ji Li, Chao Zhang, Keiji Yanai, Dong Chen, Yuhui Yuan,
- Abstract summary: We release the first open, ultra-high-fidelity PrismLayers dataset of 200K (20K) multilayer transparent images with accurate alpha mattes.<n>We also deliver a strong, open-source multi-layer generation model, ART+, which matches the aesthetics of modern text-to-image generation models.
- Score: 25.859278092788237
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generating high-quality, multi-layer transparent images from text prompts can unlock a new level of creative control, allowing users to edit each layer as effortlessly as editing text outputs from LLMs. However, the development of multi-layer generative models lags behind that of conventional text-to-image models due to the absence of a large, high-quality corpus of multi-layer transparent data. In this paper, we address this fundamental challenge by: (i) releasing the first open, ultra-high-fidelity PrismLayers (PrismLayersPro) dataset of 200K (20K) multilayer transparent images with accurate alpha mattes, (ii) introducing a trainingfree synthesis pipeline that generates such data on demand using off-the-shelf diffusion models, and (iii) delivering a strong, open-source multi-layer generation model, ART+, which matches the aesthetics of modern text-to-image generation models. The key technical contributions include: LayerFLUX, which excels at generating high-quality single transparent layers with accurate alpha mattes, and MultiLayerFLUX, which composes multiple LayerFLUX outputs into complete images, guided by human-annotated semantic layout. To ensure higher quality, we apply a rigorous filtering stage to remove artifacts and semantic mismatches, followed by human selection. Fine-tuning the state-of-the-art ART model on our synthetic PrismLayersPro yields ART+, which outperforms the original ART in 60% of head-to-head user study comparisons and even matches the visual quality of images generated by the FLUX.1-[dev] model. We anticipate that our work will establish a solid dataset foundation for the multi-layer transparent image generation task, enabling research and applications that require precise, editable, and visually compelling layered imagery.
Related papers
- Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling [80.30976039119236]
Lumina-mGPT 2.0 is a stand-alone, decoder-only autoregressive model.<n>It is trained entirely from scratch, enabling unrestricted architectural design and licensing freedom.<n>It achieves generation quality on par with state-of-the-art diffusion models.
arXiv Detail & Related papers (2025-07-23T17:42:13Z) - PSDiffusion: Harmonized Multi-Layer Image Generation via Layout and Appearance Alignment [24.964578950380947]
PSDiffusion is a unified diffusion framework for simultaneous multi-layer text-to-image generation.<n>Our model can automatically generate multi-layer images with one RGB background and multiple RGBA foregrounds.<n>Our method introduces a global-layer interactive mechanism that generates layered-images concurrently and collaboratively.
arXiv Detail & Related papers (2025-05-16T17:23:35Z) - ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement [68.05833403672274]
Existing unified models have struggled to handle the three fundamental capabilities in a unified model: understanding, generation, and editing.<n>ILLUME+ introduces a unified dual visual tokenizer, DualViTok, which preserves fine-grained textures and text-aligned semantics.<n>We also employ a diffusion model as the image detokenizer for enhanced generation quality and efficient super-resolution.
arXiv Detail & Related papers (2025-04-02T17:45:00Z) - ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation [108.69315278353932]
We introduce the Anonymous Region Transformer (ART), which facilitates the direct generation of variable multi-layer transparent images.<n>By enabling precise control and scalable layer generation, ART establishes a new paradigm for interactive content creation.
arXiv Detail & Related papers (2025-02-25T16:57:04Z) - LayerFusion: Harmonized Multi-Layer Text-to-Image Generation with Generative Priors [38.47462111828742]
Layered content generation is crucial for creative fields like graphic design, animation, and digital art.<n>We propose a novel image generation pipeline based on Latent Diffusion Models (LDMs) that generates images with two layers.<n>We show significant improvements in visual coherence, image quality, and layer consistency compared to baseline methods.
arXiv Detail & Related papers (2024-12-05T18:59:18Z) - Generative Image Layer Decomposition with Visual Effects [49.75021036203426]
LayerDecomp is a generative framework for image layer decomposition.<n>It produces clean backgrounds and high-quality transparent foregrounds with faithfully preserved visual effects.<n>Our method achieves superior quality in layer decomposition, outperforming existing approaches in object removal and spatial editing tasks.
arXiv Detail & Related papers (2024-11-26T20:26:49Z) - LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model [70.14953942532621]
Layer-collaborative diffusion model, named LayerDiff, is designed for text-guided, multi-layered, composable image synthesis.
Our model can generate high-quality multi-layered images with performance comparable to conventional whole-image generation methods.
LayerDiff enables a broader range of controllable generative applications, including layer-specific image editing and style transfer.
arXiv Detail & Related papers (2024-03-18T16:28:28Z) - Transparent Image Layer Diffusion using Latent Transparency [30.77316047044662]
We present LayerDiffuse, an approach enabling large-scale pretrained latent diffusion models to generate transparent images.
The method learns a "latent transparency" that encodes alpha channel transparency into the latent manifold of a pretrained latent diffusion model.
It preserves the production-ready quality of the large diffusion model by regulating the added transparency as a latent offset.
arXiv Detail & Related papers (2024-02-27T01:19:53Z) - Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and
Latent Diffusion [50.59261592343479]
We present Kandinsky1, a novel exploration of latent diffusion architecture.
The proposed model is trained separately to map text embeddings to image embeddings of CLIP.
We also deployed a user-friendly demo system that supports diverse generative modes such as text-to-image generation, image fusion, text and image fusion, image variations generation, and text-guided inpainting/outpainting.
arXiv Detail & Related papers (2023-10-05T12:29:41Z) - Text2Layer: Layered Image Generation using Latent Diffusion Model [12.902259486204898]
We propose to generate layered images from a layered image generation perspective.
To achieve layered image generation, we train an autoencoder that is able to reconstruct layered images.
Experimental results show that the proposed method is able to generate high-quality layered images.
arXiv Detail & Related papers (2023-07-19T06:56:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.