GLoD: Composing Global Contexts and Local Details in Image Generation
- URL: http://arxiv.org/abs/2404.15447v1
- Date: Tue, 23 Apr 2024 18:39:57 GMT
- Title: GLoD: Composing Global Contexts and Local Details in Image Generation
- Authors: Moyuru Yamada,
- Abstract summary: Global-Local Diffusion (textitGLoD) is a novel framework which allows simultaneous control over the global contexts and the local details.
It assigns multiple global and local prompts to corresponding layers and composes their noises to guide a denoising process.
Our framework enables complex global-local compositions, conditioning objects in the global prompt with the local prompts while preserving other unspecified identities.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models have demonstrated their capability to synthesize high-quality and diverse images from textual prompts. However, simultaneous control over both global contexts (e.g., object layouts and interactions) and local details (e.g., colors and emotions) still remains a significant challenge. The models often fail to understand complex descriptions involving multiple objects and reflect specified visual attributes to wrong targets or ignore them. This paper presents Global-Local Diffusion (\textit{GLoD}), a novel framework which allows simultaneous control over the global contexts and the local details in text-to-image generation without requiring training or fine-tuning. It assigns multiple global and local prompts to corresponding layers and composes their noises to guide a denoising process using pre-trained diffusion models. Our framework enables complex global-local compositions, conditioning objects in the global prompt with the local prompts while preserving other unspecified identities. Our quantitative and qualitative evaluations demonstrate that GLoD effectively generates complex images that adhere to both user-provided object interactions and object details.
Related papers
- HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts [77.62320553269615]
HiPrompt is a tuning-free solution for higher-resolution image generation.
hierarchical prompts offer both global and local guidance.
generated images maintain coherent local and global semantics, structures, and textures with high definition.
arXiv Detail & Related papers (2024-09-04T17:58:08Z) - Zero-shot Text-guided Infinite Image Synthesis with LLM guidance [2.531998650341267]
There is a lack of text-image paired datasets with high-resolution and contextual diversity.
Expanding images based on text requires global coherence and rich local context understanding.
We propose a novel approach utilizing Large Language Models (LLMs) for both global coherence and local context understanding.
arXiv Detail & Related papers (2024-07-17T15:10:01Z) - Bridging Local Details and Global Context in Text-Attributed Graphs [62.522550655068336]
GraphBridge is a framework that bridges local and global perspectives by leveraging contextual textual information.
Our method achieves state-of-theart performance, while our graph-aware token reduction module significantly enhances efficiency and solves scalability issues.
arXiv Detail & Related papers (2024-06-18T13:35:25Z) - Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image
Synthesis [12.490787443456636]
We present a conditional diffusion model that integrates control over three-dimensional object placement with disentangled representations of global stylistic semantics.
Our integrated framework, textscCompose and Conquer (CnC), unifies these techniques to localize multiple conditions in a disentangled manner.
arXiv Detail & Related papers (2024-01-17T08:30:47Z) - LLM Blueprint: Enabling Text-to-Image Generation with Complex and
Detailed Prompts [60.54912319612113]
Diffusion-based generative models have significantly advanced text-to-image generation but encounter challenges when processing lengthy and intricate text prompts.
We present a novel approach leveraging Large Language Models (LLMs) to extract critical components from text prompts.
Our evaluation on complex prompts featuring multiple objects demonstrates a substantial improvement in recall compared to baseline diffusion models.
arXiv Detail & Related papers (2023-10-16T17:57:37Z) - LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts [107.11267074981905]
We propose a semantically controllable layout-AWare diffusion model, termed LAW-Diffusion.
We show that LAW-Diffusion yields the state-of-the-art generative performance, especially with coherent object relations.
arXiv Detail & Related papers (2023-08-13T08:06:18Z) - Local and Global GANs with Semantic-Aware Upsampling for Image
Generation [201.39323496042527]
We consider generating images using local context.
We propose a class-specific generative network using semantic maps as guidance.
Lastly, we propose a novel semantic-aware upsampling method.
arXiv Detail & Related papers (2022-02-28T19:24:25Z) - Local Class-Specific and Global Image-Level Generative Adversarial
Networks for Semantic-Guided Scene Generation [135.4660201856059]
We consider learning the scene generation in a local context, and design a local class-specific generative network with semantic maps as a guidance.
To learn more discrimi class-specific feature representations for the local generation, a novel classification module is also proposed.
Experiments on two scene image generation tasks show superior generation performance of the proposed model.
arXiv Detail & Related papers (2019-12-27T16:14:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.