LSReGen: Large-Scale Regional Generator via Backward Guidance Framework
- URL: http://arxiv.org/abs/2407.15066v1
- Date: Sun, 21 Jul 2024 05:44:46 GMT
- Title: LSReGen: Large-Scale Regional Generator via Backward Guidance Framework
- Authors: Bowen Zhang, Cheng Yang, Xuanhui Liu,
- Abstract summary: controllable image generation remains a challenge.
Current methods, such as training, forward guidance, and backward guidance, have notable limitations.
We propose a novel controllable generation framework that offers a generalized interpretation of backward guidance.
We introduce LSReGen, a large-scale layout-to-image method designed to generate high-quality, layout-compliant images.
- Score: 12.408195812609042
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, advancements in AIGC (Artificial Intelligence Generated Content) technology have significantly enhanced the capabilities of large text-to-image models. Despite these improvements, controllable image generation remains a challenge. Current methods, such as training, forward guidance, and backward guidance, have notable limitations. The first two approaches either demand substantial computational resources or produce subpar results. The third approach depends on phenomena specific to certain model architectures, complicating its application to large-scale image generation.To address these issues, we propose a novel controllable generation framework that offers a generalized interpretation of backward guidance without relying on specific assumptions. Leveraging this framework, we introduce LSReGen, a large-scale layout-to-image method designed to generate high-quality, layout-compliant images. Experimental results show that LSReGen outperforms existing methods in the large-scale layout-to-image task, underscoring the effectiveness of our proposed framework. Our code and models will be open-sourced.
Related papers
- CART: Compositional Auto-Regressive Transformer for Image Generation [2.5563396001349297]
We introduce a novel approach to image generation using Auto-Regressive (AR) modeling.
Our proposed method addresses these challenges by iteratively adding finer details to an image compositionally.
This strategy is shown to be more effective than the conventional next-token prediction and even surpasses the state-of-the-art next-scale prediction approaches.
arXiv Detail & Related papers (2024-11-15T13:29:44Z) - RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
Large Model [93.8067369210696]
Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions.
Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps.
In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models.
arXiv Detail & Related papers (2023-09-02T03:27:20Z) - GLEAN: Generative Latent Bank for Image Super-Resolution and Beyond [99.6233044915999]
We show that pre-trained Generative Adversarial Networks (GANs) such as StyleGAN and BigGAN can be used as a latent bank to improve the performance of image super-resolution.
Our method, Generative LatEnt bANk (GLEAN), goes beyond existing practices by directly leveraging rich and diverse priors encapsulated in a pre-trained GAN.
We extend our method to different tasks including image colorization and blind image restoration, and extensive experiments show that our proposed models perform favorably in comparison to existing methods.
arXiv Detail & Related papers (2022-07-29T17:59:01Z) - A Survey on Leveraging Pre-trained Generative Adversarial Networks for
Image Editing and Restoration [72.17890189820665]
Generative adversarial networks (GANs) have drawn enormous attention due to the simple yet effective training mechanism and superior image generation quality.
Recent GAN models have greatly narrowed the gaps between the generated images and the real ones.
Many recent works show emerging interest to take advantage of pre-trained GAN models by exploiting the well-disentangled latent space and the learned GAN priors.
arXiv Detail & Related papers (2022-07-21T05:05:58Z) - PAGER: Progressive Attribute-Guided Extendable Robust Image Generation [38.484332924924914]
This work presents a generative modeling approach based on successive subspace learning (SSL)
Unlike most generative models in the literature, our method does not utilize neural networks to analyze the underlying source distribution and synthesize images.
The resulting method, called the progressive-guided extendable robust image generative (R) model, has advantages in mathematical transparency, progressive content generation, lower training time, robust performance with fewer training samples, and extendibility to conditional image generation.
arXiv Detail & Related papers (2022-06-01T00:35:42Z) - InvGAN: Invertible GANs [88.58338626299837]
InvGAN, short for Invertible GAN, successfully embeds real images to the latent space of a high quality generative model.
This allows us to perform image inpainting, merging, and online data augmentation.
arXiv Detail & Related papers (2021-12-08T21:39:00Z) - A Generic Approach for Enhancing GANs by Regularized Latent Optimization [79.00740660219256]
We introduce a generic framework called em generative-model inference that is capable of enhancing pre-trained GANs effectively and seamlessly.
Our basic idea is to efficiently infer the optimal latent distribution for the given requirements using Wasserstein gradient flow techniques.
arXiv Detail & Related papers (2021-12-07T05:22:50Z) - Improved Image Generation via Sparse Modeling [27.66648389933265]
We show that generators can be viewed as manifestations of the Convolutional Sparse Coding (CSC) and its Multi-Layered version (ML-CSC) synthesis processes.
We leverage this observation by explicitly enforcing a sparsifying regularization on appropriately chosen activation layers in the generator.
arXiv Detail & Related papers (2021-04-01T13:52:40Z) - GLEAN: Generative Latent Bank for Large-Factor Image Super-Resolution [85.53811497840725]
We show that Generative Adversarial Networks (GANs), e.g., StyleGAN, can be used as a latent bank to improve the restoration quality of large-factor image super-resolution (SR)
Our method, Generative LatEnt bANk (GLEAN), goes beyond existing practices by directly leveraging rich and diverse priors encapsulated in a pre-trained GAN.
Images upscaled by GLEAN show clear improvements in terms of fidelity and texture faithfulness in comparison to existing methods.
arXiv Detail & Related papers (2020-12-01T18:56:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.