LayerFusion: Harmonized Multi-Layer Text-to-Image Generation with Generative Priors
- URL: http://arxiv.org/abs/2412.04460v1
- Date: Thu, 05 Dec 2024 18:59:18 GMT
- Title: LayerFusion: Harmonized Multi-Layer Text-to-Image Generation with Generative Priors
- Authors: Yusuf Dalva, Yijun Li, Qing Liu, Nanxuan Zhao, Jianming Zhang, Zhe Lin, Pinar Yanardag,
- Abstract summary: Layered content generation is crucial for creative fields like graphic design, animation, and digital art.
We propose a novel image generation pipeline based on Latent Diffusion Models (LDMs) that generates images with two layers.
We show significant improvements in visual coherence, image quality, and layer consistency compared to baseline methods.
- Score: 38.47462111828742
- License:
- Abstract: Large-scale diffusion models have achieved remarkable success in generating high-quality images from textual descriptions, gaining popularity across various applications. However, the generation of layered content, such as transparent images with foreground and background layers, remains an under-explored area. Layered content generation is crucial for creative workflows in fields like graphic design, animation, and digital art, where layer-based approaches are fundamental for flexible editing and composition. In this paper, we propose a novel image generation pipeline based on Latent Diffusion Models (LDMs) that generates images with two layers: a foreground layer (RGBA) with transparency information and a background layer (RGB). Unlike existing methods that generate these layers sequentially, our approach introduces a harmonized generation mechanism that enables dynamic interactions between the layers for more coherent outputs. We demonstrate the effectiveness of our method through extensive qualitative and quantitative experiments, showing significant improvements in visual coherence, image quality, and layer consistency compared to baseline methods.
Related papers
- Enhancing Image Generation Fidelity via Progressive Prompts [25.099694657440992]
We propose a coarse - to - fine generation pipeline for regional prompt - following generation.
We find that deeper layers are always responsible for high - level content control, while shallow layers handle low - level content control.
Various prompts are injected into the proposed regional cross - attention control for coarse - to - fine generation.
arXiv Detail & Related papers (2025-01-13T05:48:32Z) - LayeringDiff: Layered Image Synthesis via Generation, then Disassembly with Generative Knowledge [14.481577976493236]
LayeringDiff is a novel pipeline for the synthesis of layered images.
By extracting layers from a composite image, rather than generating them from scratch, LayeringDiff bypasses the need for large-scale training.
For effective layer decomposition, we adapt a large-scale pretrained generative prior to estimate foreground and background layers.
arXiv Detail & Related papers (2025-01-02T11:18:25Z) - Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model [70.14953942532621]
Layer-collaborative diffusion model, named LayerDiff, is designed for text-guided, multi-layered, composable image synthesis.
Our model can generate high-quality multi-layered images with performance comparable to conventional whole-image generation methods.
LayerDiff enables a broader range of controllable generative applications, including layer-specific image editing and style transfer.
arXiv Detail & Related papers (2024-03-18T16:28:28Z) - Text2Layer: Layered Image Generation using Latent Diffusion Model [12.902259486204898]
We propose to generate layered images from a layered image generation perspective.
To achieve layered image generation, we train an autoencoder that is able to reconstruct layered images.
Experimental results show that the proposed method is able to generate high-quality layered images.
arXiv Detail & Related papers (2023-07-19T06:56:07Z) - Energy-Based Cross Attention for Bayesian Context Update in
Text-to-Image Diffusion Models [62.603753097900466]
We present a novel energy-based model (EBM) framework for adaptive context control by modeling the posterior of context vectors.
Specifically, we first formulate EBMs of latent image representations and text embeddings in each cross-attention layer of the denoising autoencoder.
Our latent EBMs further allow zero-shot compositional generation as a linear combination of cross-attention outputs from different contexts.
arXiv Detail & Related papers (2023-06-16T14:30:41Z) - Deep Image Compositing [93.75358242750752]
We propose a new method which can automatically generate high-quality image composites without any user input.
Inspired by Laplacian pyramid blending, a dense-connected multi-stream fusion network is proposed to effectively fuse the information from the foreground and background images.
Experiments show that the proposed method can automatically generate high-quality composites and outperforms existing methods both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-11-04T06:12:24Z) - Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match.
The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z) - Hierarchical Opacity Propagation for Image Matting [15.265494938937737]
A novel structure for more direct alpha matte propagation between pixels is in demand.
HOP matting is capable of outperforming state-of-the-art matting methods.
arXiv Detail & Related papers (2020-04-07T10:39:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.