Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient
Vision Transformers
- URL: http://arxiv.org/abs/2310.05400v1
- Date: Mon, 9 Oct 2023 04:38:52 GMT
- Title: Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient
Vision Transformers
- Authors: Shiyue Cao, Yueqin Yin, Lianghua Huang, Yu Liu, Xin Zhao, Deli Zhao,
Kaiqi Huang
- Abstract summary: We propose a more efficient two-stage framework for high-resolution image generation.
We employ a local attention-based quantization model instead of the global attention mechanism used in previous methods.
This approach results in faster generation speed, higher generation fidelity, and improved resolution.
- Score: 41.78970081787674
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vector-quantized image modeling has shown great potential in synthesizing
high-quality images. However, generating high-resolution images remains a
challenging task due to the quadratic computational overhead of the
self-attention process. In this study, we seek to explore a more efficient
two-stage framework for high-resolution image generation with improvements in
the following three aspects. (1) Based on the observation that the first
quantization stage has solid local property, we employ a local attention-based
quantization model instead of the global attention mechanism used in previous
methods, leading to better efficiency and reconstruction quality. (2) We
emphasize the importance of multi-grained feature interaction during image
generation and introduce an efficient attention mechanism that combines global
attention (long-range semantic consistency within the whole image) and local
attention (fined-grained details). This approach results in faster generation
speed, higher generation fidelity, and improved resolution. (3) We propose a
new generation pipeline incorporating autoencoding training and autoregressive
generation strategy, demonstrating a better paradigm for image synthesis.
Extensive experiments demonstrate the superiority of our approach in
high-quality and high-resolution image reconstruction and generation.
Related papers
- HiTSR: A Hierarchical Transformer for Reference-based Super-Resolution [6.546896650921257]
We propose HiTSR, a hierarchical transformer model for reference-based image super-resolution.
We streamline the architecture and training pipeline by incorporating the double attention block from GAN literature.
Our model demonstrates superior performance across three datasets including SUN80, Urban100, and Manga109.
arXiv Detail & Related papers (2024-08-30T01:16:29Z) - GECO: Generative Image-to-3D within a SECOnd [51.20830808525894]
We introduce GECO, a novel method for high-quality 3D generative modeling that operates within a second.
GECO achieves high-quality image-to-3D mesh generation with an unprecedented level of efficiency.
arXiv Detail & Related papers (2024-05-30T17:58:00Z) - TwinDiffusion: Enhancing Coherence and Efficiency in Panoramic Image Generation with Diffusion Models [3.167554518801207]
Diffusion models have emerged as effective tools for generating diverse and high-quality content.
They face challenges such as visible seams and incoherent transitions.
We propose TwinDiffusion, an optimized framework designed to address these challenges.
arXiv Detail & Related papers (2024-04-30T11:43:37Z) - IRGen: Generative Modeling for Image Retrieval [82.62022344988993]
In this paper, we present a novel methodology, reframing image retrieval as a variant of generative modeling.
We develop our model, dubbed IRGen, to address the technical challenge of converting an image into a concise sequence of semantic units.
Our model achieves state-of-the-art performance on three widely-used image retrieval benchmarks and two million-scale datasets.
arXiv Detail & Related papers (2023-03-17T17:07:36Z) - ASSET: Autoregressive Semantic Scene Editing with Transformers at High
Resolutions [28.956280590967808]
Our architecture is based on a transformer with a novel attention mechanism.
Our key idea is to sparsify the transformer's attention matrix at high resolutions, guided by dense attention extracted at lower image resolutions.
We present qualitative and quantitative results, along with user studies, demonstrating the effectiveness of our method.
arXiv Detail & Related papers (2022-05-24T17:39:53Z) - Robust Single Image Dehazing Based on Consistent and Contrast-Assisted
Reconstruction [95.5735805072852]
We propose a novel density-variational learning framework to improve the robustness of the image dehzing model.
Specifically, the dehazing network is optimized under the consistency-regularized framework.
Our method significantly surpasses the state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-29T08:11:04Z) - A Generic Approach for Enhancing GANs by Regularized Latent Optimization [79.00740660219256]
We introduce a generic framework called em generative-model inference that is capable of enhancing pre-trained GANs effectively and seamlessly.
Our basic idea is to efficiently infer the optimal latent distribution for the given requirements using Wasserstein gradient flow techniques.
arXiv Detail & Related papers (2021-12-07T05:22:50Z) - Improved Transformer for High-Resolution GANs [69.42469272015481]
We introduce two key ingredients to Transformer to address this challenge.
We show in the experiments that the proposed HiT achieves state-of-the-art FID scores of 31.87 and 2.95 on unconditional ImageNet $128 times 128$ and FFHQ $256 times 256$, respectively.
arXiv Detail & Related papers (2021-06-14T17:39:49Z) - DTGAN: Dual Attention Generative Adversarial Networks for Text-to-Image
Generation [8.26410341981427]
The Dual Attention Generative Adversarial Network (DTGAN) can synthesize high-quality and semantically consistent images.
The proposed model introduces channel-aware and pixel-aware attention modules that can guide the generator to focus on text-relevant channels and pixels.
A new type of visual loss is utilized to enhance the image resolution by ensuring vivid shape and perceptually uniform color distributions of generated images.
arXiv Detail & Related papers (2020-11-05T08:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.