Text2Layer: Layered Image Generation using Latent Diffusion Model
- URL: http://arxiv.org/abs/2307.09781v1
- Date: Wed, 19 Jul 2023 06:56:07 GMT
- Title: Text2Layer: Layered Image Generation using Latent Diffusion Model
- Authors: Xinyang Zhang, Wentian Zhao, Xin Lu, Jeff Chien
- Abstract summary: We propose to generate layered images from a layered image generation perspective.
To achieve layered image generation, we train an autoencoder that is able to reconstruct layered images.
Experimental results show that the proposed method is able to generate high-quality layered images.
- Score: 12.902259486204898
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Layer compositing is one of the most popular image editing workflows among
both amateurs and professionals. Motivated by the success of diffusion models,
we explore layer compositing from a layered image generation perspective.
Instead of generating an image, we propose to generate background, foreground,
layer mask, and the composed image simultaneously. To achieve layered image
generation, we train an autoencoder that is able to reconstruct layered images
and train diffusion models on the latent representation. One benefit of the
proposed problem is to enable better compositing workflows in addition to the
high-quality image output. Another benefit is producing higher-quality layer
masks compared to masks produced by a separate step of image segmentation.
Experimental results show that the proposed method is able to generate
high-quality layered images and initiates a benchmark for future work.
Related papers
- LayeringDiff: Layered Image Synthesis via Generation, then Disassembly with Generative Knowledge [14.481577976493236]
LayeringDiff is a novel pipeline for the synthesis of layered images.
By extracting layers from a composite image, rather than generating them from scratch, LayeringDiff bypasses the need for large-scale training.
For effective layer decomposition, we adapt a large-scale pretrained generative prior to estimate foreground and background layers.
arXiv Detail & Related papers (2025-01-02T11:18:25Z) - LayerFusion: Harmonized Multi-Layer Text-to-Image Generation with Generative Priors [38.47462111828742]
Layered content generation is crucial for creative fields like graphic design, animation, and digital art.
We propose a novel image generation pipeline based on Latent Diffusion Models (LDMs) that generates images with two layers.
We show significant improvements in visual coherence, image quality, and layer consistency compared to baseline methods.
arXiv Detail & Related papers (2024-12-05T18:59:18Z) - Generative Image Layer Decomposition with Visual Effects [49.75021036203426]
LayerDecomp is a generative framework for image layer decomposition.
It produces clean backgrounds and high-quality transparent foregrounds with faithfully preserved visual effects.
Our method achieves superior quality in layer decomposition, outperforming existing approaches in object removal and spatial editing tasks.
arXiv Detail & Related papers (2024-11-26T20:26:49Z) - LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model [70.14953942532621]
Layer-collaborative diffusion model, named LayerDiff, is designed for text-guided, multi-layered, composable image synthesis.
Our model can generate high-quality multi-layered images with performance comparable to conventional whole-image generation methods.
LayerDiff enables a broader range of controllable generative applications, including layer-specific image editing and style transfer.
arXiv Detail & Related papers (2024-03-18T16:28:28Z) - BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed
Dual-Branch Diffusion [61.90969199199739]
BrushNet is a novel plug-and-play dual-branch model engineered to embed pixel-level masked image features into any pre-trained DM.
BrushNet's superior performance over existing models across seven key metrics, including image quality, mask region preservation, and textual coherence.
arXiv Detail & Related papers (2024-03-11T17:59:31Z) - MaskSketch: Unpaired Structure-guided Masked Image Generation [56.88038469743742]
MaskSketch is an image generation method that allows spatial conditioning of the generation result using a guiding sketch as an extra conditioning signal during sampling.
We show that intermediate self-attention maps of a masked generative transformer encode important structural information of the input image.
Our results show that MaskSketch achieves high image realism and fidelity to the guiding structure.
arXiv Detail & Related papers (2023-02-10T20:27:02Z) - Deep Image Compositing [93.75358242750752]
We propose a new method which can automatically generate high-quality image composites without any user input.
Inspired by Laplacian pyramid blending, a dense-connected multi-stream fusion network is proposed to effectively fuse the information from the foreground and background images.
Experiments show that the proposed method can automatically generate high-quality composites and outperforms existing methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-11-04T06:12:24Z) - Bridging Composite and Real: Towards End-to-end Deep Image Matting [88.79857806542006]
We study the roles of semantics and details for image matting.
We propose a novel Glance and Focus Matting network (GFM), which employs a shared encoder and two separate decoders.
Comprehensive empirical studies have demonstrated that GFM outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-30T10:57:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.