DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers
- URL: http://arxiv.org/abs/2505.21541v3
- Date: Mon, 01 Sep 2025 09:14:35 GMT
- Title: DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers
- Authors: Zitong Wang, Hang Zhao, Qianyu Zhou, Xuequan Lu, Xiangtai Li, Yiren Song,
- Abstract summary: We present DiffDecompose, a diffusion Transformer-based framework that learns the posterior over possible layer decompositions conditioned on the input image.<n>The code and dataset will be available upon paper acceptance.
- Score: 85.1185656296496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have recently motivated great success in many generation tasks like object removal. Nevertheless, existing image decomposition methods struggle to disentangle semi-transparent or transparent layer occlusions due to mask prior dependencies, static object assumptions, and the lack of datasets. In this paper, we delve into a novel task: Layer-Wise Decomposition of Alpha-Composited Images, aiming to recover constituent layers from single overlapped images under the condition of semi-transparent/transparent alpha layer non-linear occlusion. To address challenges in layer ambiguity, generalization, and data scarcity, we first introduce AlphaBlend, the first large-scale and high-quality dataset for transparent and semi-transparent layer decomposition, supporting six real-world subtasks (e.g., translucent flare removal, semi-transparent cell decomposition, glassware decomposition). Building on this dataset, we present DiffDecompose, a diffusion Transformer-based framework that learns the posterior over possible layer decompositions conditioned on the input image, semantic prompts, and blending type. Rather than regressing alpha mattes directly, DiffDecompose performs In-Context Decomposition, enabling the model to predict one or multiple layers without per-layer supervision, and introduces Layer Position Encoding Cloning to maintain pixel-level correspondence across layers. Extensive experiments on the proposed AlphaBlend dataset and public LOGO dataset verify the effectiveness of DiffDecompose. The code and dataset will be available upon paper acceptance. Our code will be available at: https://github.com/Wangzt1121/DiffDecompose.
Related papers
- Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition [73.43121650616804]
We propose textbfQwen-Image-Layered, an end-to-end diffusion model that decomposes a single RGB image into multiple semantically disentangled RGBA layers.<n>Our method significantly surpasses existing approaches in decomposition quality and establishes a new paradigm for consistent image editing.
arXiv Detail & Related papers (2025-12-17T17:12:42Z) - OmniPSD: Layered PSD Generation with Diffusion Transformer [59.20320950128599]
We propose OmniPSD, a unified diffusion framework built upon the Flux ecosystem.<n>It enables text-to-PSD generation and image-to-PSD decomposition through in-context learning.<n>Experiments on our new RGBA-layered dataset demonstrate that OmniPSD achieves high-fidelity generation.
arXiv Detail & Related papers (2025-12-10T02:09:59Z) - From Inpainting to Layer Decomposition: Repurposing Generative Inpainting Models for Image Layer Decomposition [16.7393689710179]
layered representation enables independent editing of elements, offering greater flexibility for content creation.<n>We observe a strong connection between layer decomposition and in/outpainting tasks, and propose adapting a diffusion-based inpainting model for layer decomposition using lightweight finetuning.<n>To further preserve detail in the latent space, we introduce a novel multi-modal context fusion module with linear attention complexity.
arXiv Detail & Related papers (2025-11-26T02:50:07Z) - Learning Deblurring Texture Prior from Unpaired Data with Diffusion Model [92.61216319417208]
We propose a novel diffusion model (DM)-based framework, dubbed ours, for image deblurring.<n>ours performs DM to generate the prior knowledge that aids in recovering the textures of blurry images.<n>To fully exploit the generated texture priors, we present the Texture Transfer Transformer layer (TTformer)
arXiv Detail & Related papers (2025-07-18T01:50:31Z) - PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models [25.859278092788237]
We release the first open, ultra-high-fidelity PrismLayers dataset of 200K (20K) multilayer transparent images with accurate alpha mattes.<n>We also deliver a strong, open-source multi-layer generation model, ART+, which matches the aesthetics of modern text-to-image generation models.
arXiv Detail & Related papers (2025-05-28T16:09:33Z) - PSDiffusion: Harmonized Multi-Layer Image Generation via Layout and Appearance Alignment [24.964578950380947]
PSDiffusion is a unified diffusion framework for simultaneous multi-layer text-to-image generation.<n>Our model can automatically generate multi-layer images with one RGB background and multiple RGBA foregrounds.<n>Our method introduces a global-layer interactive mechanism that generates layered-images concurrently and collaboratively.
arXiv Detail & Related papers (2025-05-16T17:23:35Z) - LayeringDiff: Layered Image Synthesis via Generation, then Disassembly with Generative Knowledge [14.481577976493236]
LayeringDiff is a novel pipeline for the synthesis of layered images.<n>By extracting layers from a composite image, rather than generating them from scratch, LayeringDiff bypasses the need for large-scale training.<n>For effective layer decomposition, we adapt a large-scale pretrained generative prior to estimate foreground and background layers.
arXiv Detail & Related papers (2025-01-02T11:18:25Z) - Generative Image Layer Decomposition with Visual Effects [49.75021036203426]
LayerDecomp is a generative framework for image layer decomposition.<n>It produces clean backgrounds and high-quality transparent foregrounds with faithfully preserved visual effects.<n>Our method achieves superior quality in layer decomposition, outperforming existing approaches in object removal and spatial editing tasks.
arXiv Detail & Related papers (2024-11-26T20:26:49Z) - MaterialFusion: Enhancing Inverse Rendering with Material Diffusion Priors [67.74705555889336]
We introduce MaterialFusion, an enhanced conventional 3D inverse rendering pipeline that incorporates a 2D prior on texture and material properties.<n>We present StableMaterial, a 2D diffusion model prior that refines multi-lit data to estimate the most likely albedo and material from given input appearances.<n>We validate MaterialFusion's relighting performance on 4 datasets of synthetic and real objects under diverse illumination conditions.
arXiv Detail & Related papers (2024-09-23T17:59:06Z) - Neural Spline Fields for Burst Image Fusion and Layer Separation [40.9442467471977]
We propose a versatile intermediate representation: a two-layer alpha-composited image plus flow model constructed with neural spline fields.
Our method is able to jointly fuse a burst image capture into one high-resolution reconstruction and decompose it into transmission and obstruction layers.
We find that, with no post-processing steps or learned priors, our generalizable model is able to outperform existing dedicated single-image and multi-view obstruction removal approaches.
arXiv Detail & Related papers (2023-12-21T18:54:19Z) - DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection.
It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor.
Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z) - WALDO: Future Video Synthesis using Object Layer Decomposition and
Parametric Flow Prediction [82.79642869586587]
WALDO is a novel approach to the prediction of future video frames from past ones.
Individual images are decomposed into multiple layers combining object masks and a small set of control points.
The layer structure is shared across all frames in each video to build dense inter-frame connections.
arXiv Detail & Related papers (2022-11-25T18:59:46Z) - Occlusion-Aware Instance Segmentation via BiLayer Network Architectures [73.45922226843435]
We propose Bilayer Convolutional Network (BCNet), where the top layer detects occluding objects (occluders) and the bottom layer infers partially occluded instances (occludees)
We investigate the efficacy of bilayer structure using two popular convolutional network designs, namely, Fully Convolutional Network (FCN) and Graph Convolutional Network (GCN)
arXiv Detail & Related papers (2022-08-08T21:39:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.