Related papers: Text2Layer: Layered Image Generation using Latent Diffusion Model

Text2Layer: Layered Image Generation using Latent Diffusion Model

URL: http://arxiv.org/abs/2307.09781v1
Date: Wed, 19 Jul 2023 06:56:07 GMT
Title: Text2Layer: Layered Image Generation using Latent Diffusion Model
Authors: Xinyang Zhang, Wentian Zhao, Xin Lu, Jeff Chien
Abstract summary: We propose to generate layered images from a layered image generation perspective. To achieve layered image generation, we train an autoencoder that is able to reconstruct layered images. Experimental results show that the proposed method is able to generate high-quality layered images.
Score: 12.902259486204898
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Layer compositing is one of the most popular image editing workflows among both amateurs and professionals. Motivated by the success of diffusion models, we explore layer compositing from a layered image generation perspective. Instead of generating an image, we propose to generate background, foreground, layer mask, and the composed image simultaneously. To achieve layered image generation, we train an autoencoder that is able to reconstruct layered images and train diffusion models on the latent representation. One benefit of the proposed problem is to enable better compositing workflows in addition to the high-quality image output. Another benefit is producing higher-quality layer masks compared to masks produced by a separate step of image segmentation. Experimental results show that the proposed method is able to generate high-quality layered images and initiates a benchmark for future work.

Related papers

PSDiffusion: Harmonized Multi-Layer Image Generation via Layout and Appearance Alignment [24.964578950380947]
PSDiffusion is a unified diffusion framework for simultaneous multi-layer text-to-image generation.<n>Our model can automatically generate multi-layer images with one RGB background and multiple RGBA foregrounds.<n>Our method introduces a global-layer interactive mechanism that generates layered-images concurrently and collaboratively.
arXiv Detail & Related papers (2025-05-16T17:23:35Z)
LayeringDiff: Layered Image Synthesis via Generation, then Disassembly with Generative Knowledge [14.481577976493236]
LayeringDiff is a novel pipeline for the synthesis of layered images. By extracting layers from a composite image, rather than generating them from scratch, LayeringDiff bypasses the need for large-scale training. For effective layer decomposition, we adapt a large-scale pretrained generative prior to estimate foreground and background layers.
arXiv Detail & Related papers (2025-01-02T11:18:25Z)
LayerFusion: Harmonized Multi-Layer Text-to-Image Generation with Generative Priors [38.47462111828742]
Layered content generation is crucial for creative fields like graphic design, animation, and digital art. We propose a novel image generation pipeline based on Latent Diffusion Models (LDMs) that generates images with two layers. We show significant improvements in visual coherence, image quality, and layer consistency compared to baseline methods.
arXiv Detail & Related papers (2024-12-05T18:59:18Z)
Generative Image Layer Decomposition with Visual Effects [49.75021036203426]
LayerDecomp is a generative framework for image layer decomposition. It produces clean backgrounds and high-quality transparent foregrounds with faithfully preserved visual effects. Our method achieves superior quality in layer decomposition, outperforming existing approaches in object removal and spatial editing tasks.
arXiv Detail & Related papers (2024-11-26T20:26:49Z)
Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration [35.3663995646582]
All-in-one image restoration aims to handle multiple degradation types using one model. This paper proposes a simple pipeline for all-in-one blind image restoration to Restore Anything with Masks (RAM)
arXiv Detail & Related papers (2024-09-28T16:33:43Z)
LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model [70.14953942532621]
Layer-collaborative diffusion model, named LayerDiff, is designed for text-guided, multi-layered, composable image synthesis. Our model can generate high-quality multi-layered images with performance comparable to conventional whole-image generation methods. LayerDiff enables a broader range of controllable generative applications, including layer-specific image editing and style transfer.
arXiv Detail & Related papers (2024-03-18T16:28:28Z)
BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion [61.90969199199739]
BrushNet is a novel plug-and-play dual-branch model engineered to embed pixel-level masked image features into any pre-trained DM. BrushNet's superior performance over existing models across seven key metrics, including image quality, mask region preservation, and textual coherence.
arXiv Detail & Related papers (2024-03-11T17:59:31Z)
MaskSketch: Unpaired Structure-guided Masked Image Generation [56.88038469743742]
MaskSketch is an image generation method that allows spatial conditioning of the generation result using a guiding sketch as an extra conditioning signal during sampling. We show that intermediate self-attention maps of a masked generative transformer encode important structural information of the input image. Our results show that MaskSketch achieves high image realism and fidelity to the guiding structure.
arXiv Detail & Related papers (2023-02-10T20:27:02Z)
V-LinkNet: Learning Contextual Inpainting Across Latent Space of Generative Adversarial Network [7.5089719291325325]
We propose the V-LinkNet cross-space learning strategy network to improve learning on contextualised features. We compare inpainting performance on the same face with different masks and on different faces with the same masks. Our result surpasses the state of the art when evaluated on the CelebA-HQ with the standard protocol.
arXiv Detail & Related papers (2022-01-02T09:14:23Z)
Structure First Detail Next: Image Inpainting with Pyramid Generator [26.94101909283021]
We propose to build a Pyramid Generator by stacking several sub-generators. Lower-layer sub-generators focus on restoring image structures while the higher-layer sub-generators emphasize image details. Our approach has a learning scheme of progressively increasing hole size, which allows it to restore large-hole images.
arXiv Detail & Related papers (2021-06-16T16:00:16Z)
Deep Image Compositing [93.75358242750752]
We propose a new method which can automatically generate high-quality image composites without any user input. Inspired by Laplacian pyramid blending, a dense-connected multi-stream fusion network is proposed to effectively fuse the information from the foreground and background images. Experiments show that the proposed method can automatically generate high-quality composites and outperforms existing methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-11-04T06:12:24Z)
Bridging Composite and Real: Towards End-to-end Deep Image Matting [88.79857806542006]
We study the roles of semantics and details for image matting. We propose a novel Glance and Focus Matting network (GFM), which employs a shared encoder and two separate decoders. Comprehensive empirical studies have demonstrated that GFM outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-30T10:57:13Z)
Free-Form Image Inpainting via Contrastive Attention Network [64.05544199212831]
In image inpainting tasks, masks with any shapes can appear anywhere in images which form complex patterns. It is difficult for encoders to capture such powerful representations under this complex situation. We propose a self-supervised Siamese inference network to improve the robustness and generalization.
arXiv Detail & Related papers (2020-10-29T14:46:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.