A Generic Approach for Enhancing GANs by Regularized Latent Optimization
- URL: http://arxiv.org/abs/2112.03502v1
- Date: Tue, 7 Dec 2021 05:22:50 GMT
- Title: A Generic Approach for Enhancing GANs by Regularized Latent Optimization
- Authors: Yufan Zhou, Chunyuan Li, Changyou Chen, Jinhui Xu
- Abstract summary: We introduce a generic framework called em generative-model inference that is capable of enhancing pre-trained GANs effectively and seamlessly.
Our basic idea is to efficiently infer the optimal latent distribution for the given requirements using Wasserstein gradient flow techniques.
- Score: 79.00740660219256
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the rapidly growing model complexity and data volume, training deep
generative models (DGMs) for better performance has becoming an increasingly
more important challenge. Previous research on this problem has mainly focused
on improving DGMs by either introducing new objective functions or designing
more expressive model architectures. However, such approaches often introduce
significantly more computational and/or designing overhead. To resolve such
issues, we introduce in this paper a generic framework called {\em
generative-model inference} that is capable of enhancing pre-trained GANs
effectively and seamlessly in a variety of application scenarios. Our basic
idea is to efficiently infer the optimal latent distribution for the given
requirements using Wasserstein gradient flow techniques, instead of re-training
or fine-tuning pre-trained model parameters. Extensive experimental results on
applications like image generation, image translation, text-to-image
generation, image inpainting, and text-guided image editing suggest the
effectiveness and superiority of our proposed framework.
Related papers
- Visual Autoregressive Modeling for Image Super-Resolution [14.935662351654601]
We propose a novel visual autoregressive modeling for ISR framework with the form of next-scale prediction.
We collect large-scale data and design a training process to obtain robust generative priors.
arXiv Detail & Related papers (2025-01-31T09:53:47Z) - Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step [77.86514804787622]
Chain-of-Thought (CoT) reasoning has been extensively explored in large models to tackle complex understanding tasks.
We provide the first comprehensive investigation of the potential of CoT reasoning to enhance autoregressive image generation.
We propose the Potential Assessment Reward Model (PARM) and PARM++, specialized for autoregressive image generation.
arXiv Detail & Related papers (2025-01-23T18:59:43Z) - CART: Compositional Auto-Regressive Transformer for Image Generation [2.5563396001349297]
We introduce a novel approach to image generation using Auto-Regressive (AR) modeling.
Our proposed method addresses these challenges by iteratively adding finer details to an image compositionally.
This strategy is shown to be more effective than the conventional next-token prediction and even surpasses the state-of-the-art next-scale prediction approaches.
arXiv Detail & Related papers (2024-11-15T13:29:44Z) - DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure Guidance [11.44012694656102]
Large-scale generative models, such as text-to-image diffusion models, have garnered widespread attention across diverse domains.
Existing large-scale diffusion models are confined to generating images of up to 1K resolution.
We propose a novel progressive approach that fully utilizes generated low-resolution images to guide the generation of higher-resolution images.
arXiv Detail & Related papers (2024-06-26T16:10:31Z) - YaART: Yet Another ART Rendering Technology [119.09155882164573]
This study introduces YaART, a novel production-grade text-to-image cascaded diffusion model aligned to human preferences.
We analyze how these choices affect both the efficiency of the training process and the quality of the generated images.
We demonstrate that models trained on smaller datasets of higher-quality images can successfully compete with those trained on larger datasets.
arXiv Detail & Related papers (2024-04-08T16:51:19Z) - RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
Large Model [93.8067369210696]
Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions.
Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps.
In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models.
arXiv Detail & Related papers (2023-09-02T03:27:20Z) - IRGen: Generative Modeling for Image Retrieval [82.62022344988993]
In this paper, we present a novel methodology, reframing image retrieval as a variant of generative modeling.
We develop our model, dubbed IRGen, to address the technical challenge of converting an image into a concise sequence of semantic units.
Our model achieves state-of-the-art performance on three widely-used image retrieval benchmarks and two million-scale datasets.
arXiv Detail & Related papers (2023-03-17T17:07:36Z) - InvGAN: Invertible GANs [88.58338626299837]
InvGAN, short for Invertible GAN, successfully embeds real images to the latent space of a high quality generative model.
This allows us to perform image inpainting, merging, and online data augmentation.
arXiv Detail & Related papers (2021-12-08T21:39:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.