Unleashing Transformers: Parallel Token Prediction with Discrete
Absorbing Diffusion for Fast High-Resolution Image Generation from
Vector-Quantized Codes
- URL: http://arxiv.org/abs/2111.12701v1
- Date: Wed, 24 Nov 2021 18:55:14 GMT
- Title: Unleashing Transformers: Parallel Token Prediction with Discrete
Absorbing Diffusion for Fast High-Resolution Image Generation from
Vector-Quantized Codes
- Authors: Sam Bond-Taylor, Peter Hessey, Hiroshi Sasaki, Toby P. Breckon, Chris
G. Willcocks
- Abstract summary: Recent Vector-Quantized image models have overcome the limitation of image resolution but are prohibitively slow and unidirectional as they generate tokens via element-wise autoregressive sampling from the prior.
We propose a novel discrete diffusion probabilistic model prior which enables parallel prediction of Vector-Quantized tokens by using an unconstrained Transformer architecture as the backbone.
- Score: 15.881911863960774
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Whilst diffusion probabilistic models can generate high quality image
content, key limitations remain in terms of both generating high-resolution
imagery and their associated high computational requirements. Recent
Vector-Quantized image models have overcome this limitation of image resolution
but are prohibitively slow and unidirectional as they generate tokens via
element-wise autoregressive sampling from the prior. By contrast, in this paper
we propose a novel discrete diffusion probabilistic model prior which enables
parallel prediction of Vector-Quantized tokens by using an unconstrained
Transformer architecture as the backbone. During training, tokens are randomly
masked in an order-agnostic manner and the Transformer learns to predict the
original tokens. This parallelism of Vector-Quantized token prediction in turn
facilitates unconditional generation of globally consistent high-resolution and
diverse imagery at a fraction of the computational expense. In this manner, we
can generate image resolutions exceeding that of the original training set
samples whilst additionally provisioning per-image likelihood estimates (in a
departure from generative adversarial approaches). Our approach achieves
state-of-the-art results in terms of Density (LSUN Bedroom: 1.51; LSUN
Churches: 1.12; FFHQ: 1.20) and Coverage (LSUN Bedroom: 0.83; LSUN Churches:
0.73; FFHQ: 0.80), and performs competitively on FID (LSUN Bedroom: 3.64; LSUN
Churches: 4.07; FFHQ: 6.11) whilst offering advantages in terms of both
computation and reduced training set requirements.
Related papers
- Fast constrained sampling in pre-trained diffusion models [77.21486516041391]
Diffusion models have dominated the field of large, generative image models.
We propose an algorithm for fast-constrained sampling in large pre-trained diffusion models.
arXiv Detail & Related papers (2024-10-24T14:52:38Z) - Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding [60.188309982690335]
We propose a training-free probabilistic parallel decoding algorithm, Speculative Jacobi Decoding (SJD), to accelerate auto-regressive text-to-image generation.
By introducing a probabilistic convergence criterion, our SJD accelerates the inference of auto-regressive text-to-image generation while maintaining the randomness in sampling-based token decoding.
arXiv Detail & Related papers (2024-10-02T16:05:27Z) - Traditional Classification Neural Networks are Good Generators: They are
Competitive with DDPMs and GANs [104.72108627191041]
We show that conventional neural network classifiers can generate high-quality images comparable to state-of-the-art generative models.
We propose a mask-based reconstruction module to make semantic gradients-aware to synthesize plausible images.
We show that our method is also applicable to text-to-image generation by regarding image-text foundation models.
arXiv Detail & Related papers (2022-11-27T11:25:35Z) - Improved Masked Image Generation with Token-Critic [16.749458173904934]
We introduce Token-Critic, an auxiliary model to guide the sampling of a non-autoregressive generative transformer.
A state-of-the-art generative transformer significantly improves its performance, and outperforms recent diffusion models and GANs in terms of the trade-off between generated image quality and diversity.
arXiv Detail & Related papers (2022-09-09T17:57:21Z) - Megapixel Image Generation with Step-Unrolled Denoising Autoencoders [5.145313322824774]
We propose a combination of techniques to push sample resolutions higher and reduce computational requirements for training and sampling.
These include vector-quantized GAN (VQ-GAN), a vector-quantization (VQ) model capable of high levels of lossy - but perceptually insignificant - compression; hourglass transformers, a highly scaleable self-attention model; and step-unrolled denoising autoencoders (SUNDAE), a non-autoregressive (NAR) text generative model.
Our proposed framework scales to high-resolutions ($1024 times 1024$) and trains quickly (
arXiv Detail & Related papers (2022-06-24T15:47:42Z) - Vector Quantized Diffusion Model for Text-to-Image Synthesis [47.09451151258849]
We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation.
Our experiments show that the VQ-Diffusion produces significantly better text-to-image generation results.
arXiv Detail & Related papers (2021-11-29T18:59:46Z) - High-Resolution Complex Scene Synthesis with Transformers [6.445605125467574]
coarse-grained synthesis of complex scene images via deep generative models has recently gained popularity.
We present an approach to this task, where the generative model is based on pure likelihood training without additional objectives.
We show that the resulting system is able to synthesize high-quality images consistent with the given layouts.
arXiv Detail & Related papers (2021-05-13T17:56:07Z) - InfinityGAN: Towards Infinite-Resolution Image Synthesis [92.40782797030977]
We present InfinityGAN, a method to generate arbitrary-resolution images.
We show how it trains and infers patch-by-patch seamlessly with low computational resources.
arXiv Detail & Related papers (2021-04-08T17:59:30Z) - Spatially-Adaptive Pixelwise Networks for Fast Image Translation [57.359250882770525]
We introduce a new generator architecture, aimed at fast and efficient high-resolution image-to-image translation.
We use pixel-wise networks; that is, each pixel is processed independently of others.
Our model is up to 18x faster than state-of-the-art baselines.
arXiv Detail & Related papers (2020-12-05T10:02:03Z) - RAIN: A Simple Approach for Robust and Accurate Image Classification
Networks [156.09526491791772]
It has been shown that the majority of existing adversarial defense methods achieve robustness at the cost of sacrificing prediction accuracy.
This paper proposes a novel preprocessing framework, which we term Robust and Accurate Image classificatioN(RAIN)
RAIN applies randomization over inputs to break the ties between the model forward prediction path and the backward gradient path, thus improving the model robustness.
We conduct extensive experiments on the STL10 and ImageNet datasets to verify the effectiveness of RAIN against various types of adversarial attacks.
arXiv Detail & Related papers (2020-04-24T02:03:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.