Improved Masked Image Generation with Token-Critic
- URL: http://arxiv.org/abs/2209.04439v1
- Date: Fri, 9 Sep 2022 17:57:21 GMT
- Title: Improved Masked Image Generation with Token-Critic
- Authors: Jos\'e Lezama, Huiwen Chang, Lu Jiang, Irfan Essa
- Abstract summary: We introduce Token-Critic, an auxiliary model to guide the sampling of a non-autoregressive generative transformer.
A state-of-the-art generative transformer significantly improves its performance, and outperforms recent diffusion models and GANs in terms of the trade-off between generated image quality and diversity.
- Score: 16.749458173904934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-autoregressive generative transformers recently demonstrated impressive
image generation performance, and orders of magnitude faster sampling than
their autoregressive counterparts. However, optimal parallel sampling from the
true joint distribution of visual tokens remains an open challenge. In this
paper we introduce Token-Critic, an auxiliary model to guide the sampling of a
non-autoregressive generative transformer. Given a masked-and-reconstructed
real image, the Token-Critic model is trained to distinguish which visual
tokens belong to the original image and which were sampled by the generative
transformer. During non-autoregressive iterative sampling, Token-Critic is used
to select which tokens to accept and which to reject and resample. Coupled with
Token-Critic, a state-of-the-art generative transformer significantly improves
its performance, and outperforms recent diffusion models and GANs in terms of
the trade-off between generated image quality and diversity, in the challenging
class-conditional ImageNet generation.
Related papers
- ImageFolder: Autoregressive Image Generation with Folded Tokens [51.815319504939396]
Increasing token length is a common approach to improve the image reconstruction quality.
There exists a trade-off between reconstruction and generation quality regarding token length.
We propose Image, a semantic tokenizer that provides spatially aligned image tokens that can be folded during autoregressive modeling.
arXiv Detail & Related papers (2024-10-02T17:06:39Z) - Generator Born from Classifier [66.56001246096002]
We aim to reconstruct an image generator, without relying on any data samples.
We propose a novel learning paradigm, in which the generator is trained to ensure that the convergence conditions of the network parameters are satisfied.
arXiv Detail & Related papers (2023-12-05T03:41:17Z) - Bridging the Gap between Synthetic and Authentic Images for Multimodal
Machine Translation [51.37092275604371]
Multimodal machine translation (MMT) simultaneously takes the source sentence and a relevant image as input for translation.
Recent studies suggest utilizing powerful text-to-image generation models to provide image inputs.
However, synthetic images generated by these models often follow different distributions compared to authentic images.
arXiv Detail & Related papers (2023-10-20T09:06:30Z) - StraIT: Non-autoregressive Generation with Stratified Image Transformer [63.158996766036736]
Stratified Image Transformer(StraIT) is a pure non-autoregressive(NAR) generative model.
Our experiments demonstrate that StraIT significantly improves NAR generation and out-performs existing DMs and AR methods.
arXiv Detail & Related papers (2023-03-01T18:59:33Z) - Style-Guided Inference of Transformer for High-resolution Image
Synthesis [4.974890682815778]
Transformer is eminently suitable for auto-regressive image synthesis.
In this article, we propose to take a desired output, a style image, as an additional condition without re-training the transformer.
arXiv Detail & Related papers (2022-10-11T15:21:20Z) - Auto-regressive Image Synthesis with Integrated Quantization [55.51231796778219]
This paper presents a versatile framework for conditional image generation.
It incorporates the inductive bias of CNNs and powerful sequence modeling of auto-regression.
Our method achieves superior diverse image generation performance as compared with the state-of-the-art.
arXiv Detail & Related papers (2022-07-21T22:19:17Z) - MaskGIT: Masked Generative Image Transformer [49.074967597485475]
MaskGIT learns to predict randomly masked tokens by attending to tokens in all directions.
Experiments demonstrate that MaskGIT significantly outperforms the state-of-the-art transformer model on the ImageNet dataset.
arXiv Detail & Related papers (2022-02-08T23:54:06Z) - High-Resolution Complex Scene Synthesis with Transformers [6.445605125467574]
coarse-grained synthesis of complex scene images via deep generative models has recently gained popularity.
We present an approach to this task, where the generative model is based on pure likelihood training without additional objectives.
We show that the resulting system is able to synthesize high-quality images consistent with the given layouts.
arXiv Detail & Related papers (2021-05-13T17:56:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.