Related papers: Improved Masked Image Generation with Token-Critic

Improved Masked Image Generation with Token-Critic

URL: http://arxiv.org/abs/2209.04439v1
Date: Fri, 9 Sep 2022 17:57:21 GMT
Title: Improved Masked Image Generation with Token-Critic
Authors: Jos\'e Lezama, Huiwen Chang, Lu Jiang, Irfan Essa
Abstract summary: We introduce Token-Critic, an auxiliary model to guide the sampling of a non-autoregressive generative transformer. A state-of-the-art generative transformer significantly improves its performance, and outperforms recent diffusion models and GANs in terms of the trade-off between generated image quality and diversity.
Score: 16.749458173904934
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Non-autoregressive generative transformers recently demonstrated impressive image generation performance, and orders of magnitude faster sampling than their autoregressive counterparts. However, optimal parallel sampling from the true joint distribution of visual tokens remains an open challenge. In this paper we introduce Token-Critic, an auxiliary model to guide the sampling of a non-autoregressive generative transformer. Given a masked-and-reconstructed real image, the Token-Critic model is trained to distinguish which visual tokens belong to the original image and which were sampled by the generative transformer. During non-autoregressive iterative sampling, Token-Critic is used to select which tokens to accept and which to reject and resample. Coupled with Token-Critic, a state-of-the-art generative transformer significantly improves its performance, and outperforms recent diffusion models and GANs in terms of the trade-off between generated image quality and diversity, in the challenging class-conditional ImageNet generation.

Related papers

CycleVAR: Repurposing Autoregressive Model for Unsupervised One-Step Image Translation [9.628074306577851]
Current conditional autoregressive image generation methods have shown promising results, yet their potential remains largely unexplored in the practical unsupervised image translation domain.<n>A critical limitation stems from the discrete quantization inherent in traditional Vector Quantization-based frameworks.<n>We propose Softmax Relaxed Quantization, a novel approach that reformulates codebook selection as a continuous probability mixing process.
arXiv Detail & Related papers (2025-06-29T17:43:04Z)
D2C: Unlocking the Potential of Continuous Autoregressive Image Generation with Discrete Tokens [80.75893450536577]
We propose D2C, a novel two-stage method to enhance model generation capacity. In the first stage, the discrete-valued tokens representing coarse-grained image features are sampled by employing a small discrete-valued generator. In the second stage, the continuous-valued tokens representing fine-grained image features are learned conditioned on the discrete token sequence.
arXiv Detail & Related papers (2025-03-21T13:58:49Z)
Frequency Autoregressive Image Generation with Continuous Tokens [31.833852108014312]
We introduce the frequency progressive autoregressive (textbfFAR) paradigm and instantiate FAR with the continuous tokenizer. We demonstrate the efficacy of FAR through comprehensive experiments on the ImageNet dataset.
arXiv Detail & Related papers (2025-03-07T10:34:04Z)
ImageFolder: Autoregressive Image Generation with Folded Tokens [51.815319504939396]
Increasing token length is a common approach to improve the image reconstruction quality. There exists a trade-off between reconstruction and generation quality regarding token length. We propose Image, a semantic tokenizer that provides spatially aligned image tokens that can be folded during autoregressive modeling.
arXiv Detail & Related papers (2024-10-02T17:06:39Z)
Generator Born from Classifier [66.56001246096002]
We aim to reconstruct an image generator, without relying on any data samples. We propose a novel learning paradigm, in which the generator is trained to ensure that the convergence conditions of the network parameters are satisfied.
arXiv Detail & Related papers (2023-12-05T03:41:17Z)
Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation [51.37092275604371]
Multimodal machine translation (MMT) simultaneously takes the source sentence and a relevant image as input for translation. Recent studies suggest utilizing powerful text-to-image generation models to provide image inputs. However, synthetic images generated by these models often follow different distributions compared to authentic images.
arXiv Detail & Related papers (2023-10-20T09:06:30Z)
StraIT: Non-autoregressive Generation with Stratified Image Transformer [63.158996766036736]
Stratified Image Transformer(StraIT) is a pure non-autoregressive(NAR) generative model. Our experiments demonstrate that StraIT significantly improves NAR generation and out-performs existing DMs and AR methods.
arXiv Detail & Related papers (2023-03-01T18:59:33Z)
Style-Guided Inference of Transformer for High-resolution Image Synthesis [4.974890682815778]
Transformer is eminently suitable for auto-regressive image synthesis. In this article, we propose to take a desired output, a style image, as an additional condition without re-training the transformer.
arXiv Detail & Related papers (2022-10-11T15:21:20Z)
Auto-regressive Image Synthesis with Integrated Quantization [55.51231796778219]
This paper presents a versatile framework for conditional image generation. It incorporates the inductive bias of CNNs and powerful sequence modeling of auto-regression. Our method achieves superior diverse image generation performance as compared with the state-of-the-art.
arXiv Detail & Related papers (2022-07-21T22:19:17Z)
MaskGIT: Masked Generative Image Transformer [49.074967597485475]
MaskGIT learns to predict randomly masked tokens by attending to tokens in all directions. Experiments demonstrate that MaskGIT significantly outperforms the state-of-the-art transformer model on the ImageNet dataset.
arXiv Detail & Related papers (2022-02-08T23:54:06Z)
High-Resolution Complex Scene Synthesis with Transformers [6.445605125467574]
coarse-grained synthesis of complex scene images via deep generative models has recently gained popularity. We present an approach to this task, where the generative model is based on pure likelihood training without additional objectives. We show that the resulting system is able to synthesize high-quality images consistent with the given layouts.
arXiv Detail & Related papers (2021-05-13T17:56:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.