Text2Layer: Layered Image Generation using Latent Diffusion Model
- URL: http://arxiv.org/abs/2307.09781v1
- Date: Wed, 19 Jul 2023 06:56:07 GMT
- Title: Text2Layer: Layered Image Generation using Latent Diffusion Model
- Authors: Xinyang Zhang, Wentian Zhao, Xin Lu, Jeff Chien
- Abstract summary: We propose to generate layered images from a layered image generation perspective.
To achieve layered image generation, we train an autoencoder that is able to reconstruct layered images.
Experimental results show that the proposed method is able to generate high-quality layered images.
- Score: 12.902259486204898
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Layer compositing is one of the most popular image editing workflows among
both amateurs and professionals. Motivated by the success of diffusion models,
we explore layer compositing from a layered image generation perspective.
Instead of generating an image, we propose to generate background, foreground,
layer mask, and the composed image simultaneously. To achieve layered image
generation, we train an autoencoder that is able to reconstruct layered images
and train diffusion models on the latent representation. One benefit of the
proposed problem is to enable better compositing workflows in addition to the
high-quality image output. Another benefit is producing higher-quality layer
masks compared to masks produced by a separate step of image segmentation.
Experimental results show that the proposed method is able to generate
high-quality layered images and initiates a benchmark for future work.
Related papers
- Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration [35.3663995646582]
All-in-one image restoration aims to handle multiple degradation types using one model.
This paper proposes a simple pipeline for all-in-one blind image restoration to Restore Anything with Masks (RAM)
arXiv Detail & Related papers (2024-09-28T16:33:43Z) - LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model [70.14953942532621]
Layer-collaborative diffusion model, named LayerDiff, is designed for text-guided, multi-layered, composable image synthesis.
Our model can generate high-quality multi-layered images with performance comparable to conventional whole-image generation methods.
LayerDiff enables a broader range of controllable generative applications, including layer-specific image editing and style transfer.
arXiv Detail & Related papers (2024-03-18T16:28:28Z) - BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed
Dual-Branch Diffusion [61.90969199199739]
BrushNet is a novel plug-and-play dual-branch model engineered to embed pixel-level masked image features into any pre-trained DM.
BrushNet's superior performance over existing models across seven key metrics, including image quality, mask region preservation, and textual coherence.
arXiv Detail & Related papers (2024-03-11T17:59:31Z) - MaskSketch: Unpaired Structure-guided Masked Image Generation [56.88038469743742]
MaskSketch is an image generation method that allows spatial conditioning of the generation result using a guiding sketch as an extra conditioning signal during sampling.
We show that intermediate self-attention maps of a masked generative transformer encode important structural information of the input image.
Our results show that MaskSketch achieves high image realism and fidelity to the guiding structure.
arXiv Detail & Related papers (2023-02-10T20:27:02Z) - V-LinkNet: Learning Contextual Inpainting Across Latent Space of
Generative Adversarial Network [7.5089719291325325]
We propose the V-LinkNet cross-space learning strategy network to improve learning on contextualised features.
We compare inpainting performance on the same face with different masks and on different faces with the same masks.
Our result surpasses the state of the art when evaluated on the CelebA-HQ with the standard protocol.
arXiv Detail & Related papers (2022-01-02T09:14:23Z) - Structure First Detail Next: Image Inpainting with Pyramid Generator [26.94101909283021]
We propose to build a Pyramid Generator by stacking several sub-generators.
Lower-layer sub-generators focus on restoring image structures while the higher-layer sub-generators emphasize image details.
Our approach has a learning scheme of progressively increasing hole size, which allows it to restore large-hole images.
arXiv Detail & Related papers (2021-06-16T16:00:16Z) - Deep Image Compositing [93.75358242750752]
We propose a new method which can automatically generate high-quality image composites without any user input.
Inspired by Laplacian pyramid blending, a dense-connected multi-stream fusion network is proposed to effectively fuse the information from the foreground and background images.
Experiments show that the proposed method can automatically generate high-quality composites and outperforms existing methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-11-04T06:12:24Z) - Bridging Composite and Real: Towards End-to-end Deep Image Matting [88.79857806542006]
We study the roles of semantics and details for image matting.
We propose a novel Glance and Focus Matting network (GFM), which employs a shared encoder and two separate decoders.
Comprehensive empirical studies have demonstrated that GFM outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-30T10:57:13Z) - Free-Form Image Inpainting via Contrastive Attention Network [64.05544199212831]
In image inpainting tasks, masks with any shapes can appear anywhere in images which form complex patterns.
It is difficult for encoders to capture such powerful representations under this complex situation.
We propose a self-supervised Siamese inference network to improve the robustness and generalization.
arXiv Detail & Related papers (2020-10-29T14:46:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.