Related papers: Autoregressive Image Generation without Vector Quantization

Autoregressive Image Generation without Vector Quantization

URL: http://arxiv.org/abs/2406.11838v3
Date: Fri, 01 Nov 2024 14:45:36 GMT
Title: Autoregressive Image Generation without Vector Quantization
Authors: Tianhong Li, Yonglong Tian, He Li, Mingyang Deng, Kaiming He,
Abstract summary: Conventional wisdom holds that autoregressive models for image generation are typically accompanied by vector-quantized tokens. We propose to model the per-token probability distribution using a diffusion procedure, which allows us to apply autoregressive models in a continuous-valued space.
Score: 31.798754606008067
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Conventional wisdom holds that autoregressive models for image generation are typically accompanied by vector-quantized tokens. We observe that while a discrete-valued space can facilitate representing a categorical distribution, it is not a necessity for autoregressive modeling. In this work, we propose to model the per-token probability distribution using a diffusion procedure, which allows us to apply autoregressive models in a continuous-valued space. Rather than using categorical cross-entropy loss, we define a Diffusion Loss function to model the per-token probability. This approach eliminates the need for discrete-valued tokenizers. We evaluate its effectiveness across a wide range of cases, including standard autoregressive models and generalized masked autoregressive (MAR) variants. By removing vector quantization, our image generator achieves strong results while enjoying the speed advantage of sequence modeling. We hope this work will motivate the use of autoregressive generation in other continuous-valued domains and applications. Code is available at: https://github.com/LTH14/mar.

Related papers

Bayesian generative models can flag performance loss, bias, and out-of-distribution image content [15.835055687646507]
Generative models are popular for medical imaging tasks such as anomaly detection, feature extraction, data visualization, or image generation. Since they are parameterized by deep learning models, they are often sensitive to distribution shifts and unreliable when applied to out-of-distribution data. We show how pixel-wise uncertainty can detect out-of-distribution image content such as ink, rulers, and patches.
arXiv Detail & Related papers (2025-03-21T18:45:28Z)
Learning-Order Autoregressive Models with Application to Molecular Graph Generation [52.44913282062524]
We introduce a variant of ARM that generates high-dimensional data using a probabilistic ordering that is sequentially inferred from data. We demonstrate experimentally that our method can learn meaningful autoregressive orderings in image and graph generation.
arXiv Detail & Related papers (2025-03-07T23:24:24Z)
Frequency Autoregressive Image Generation with Continuous Tokens [31.833852108014312]
We introduce the frequency progressive autoregressive (textbfFAR) paradigm and instantiate FAR with the continuous tokenizer. We demonstrate the efficacy of FAR through comprehensive experiments on the ImageNet dataset.
arXiv Detail & Related papers (2025-03-07T10:34:04Z)
One-for-More: Continual Diffusion Model for Anomaly Detection [61.12622458367425]
Anomaly detection methods utilize diffusion models to generate or reconstruct normal samples when given arbitrary anomaly images. Our study found that the diffusion model suffers from severe faithfulness hallucination'' and catastrophic forgetting'' We propose a continual diffusion model that uses gradient projection to achieve stable continual learning.
arXiv Detail & Related papers (2025-02-27T07:47:27Z)
Continuous Speculative Decoding for Autoregressive Image Generation [33.05392461723613]
Continuous-valued Autoregressive (AR) image generation models have demonstrated notable superiority over their discrete-token counterparts. speculative decoding has proven effective in accelerating Large Language Models (LLMs) This work generalizes the speculative decoding algorithm from discrete tokens to continuous space.
arXiv Detail & Related papers (2024-11-18T09:19:15Z)
Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding [84.3224556294803]
Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences. We aim to optimize downstream reward functions while preserving the naturalness of these design spaces. Our algorithm integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future.
arXiv Detail & Related papers (2024-08-15T16:47:59Z)
Glauber Generative Model: Discrete Diffusion Models via Binary Classification [21.816933208895843]
We introduce the Glauber Generative Model (GGM), a new class of discrete diffusion models. GGM deploys a Markov chain to denoise a sequence of noisy tokens to a sample from a joint distribution of discrete tokens. We show that it outperforms existing discrete diffusion models in language generation and image generation.
arXiv Detail & Related papers (2024-05-27T10:42:13Z)
A Pseudo-Semantic Loss for Autoregressive Models with Logical Constraints [87.08677547257733]
Neuro-symbolic AI bridges the gap between purely symbolic and neural approaches to learning. We show how to maximize the likelihood of a symbolic constraint w.r.t the neural network's output distribution. We also evaluate our approach on Sudoku and shortest-path prediction cast as autoregressive generation.
arXiv Detail & Related papers (2023-12-06T20:58:07Z)
Meaning Representations from Trajectories in Autoregressive Models [106.63181745054571]
We propose to extract meaning representations from autoregressive language models by considering the distribution of all possible trajectories extending an input text. This strategy is prompt-free, does not require fine-tuning, and is applicable to any pre-trained autoregressive model. We empirically show that the representations obtained from large models align well with human annotations, outperform other zero-shot and prompt-free methods on semantic similarity tasks, and can be used to solve more complex entailment and containment tasks that standard embeddings cannot handle.
arXiv Detail & Related papers (2023-10-23T04:35:58Z)
ChiroDiff: Modelling chirographic data with Diffusion Models [132.5223191478268]
We introduce a powerful model-class namely "Denoising Diffusion Probabilistic Models" or DDPMs for chirographic data. Our model named "ChiroDiff", being non-autoregressive, learns to capture holistic concepts and therefore remains resilient to higher temporal sampling rate.
arXiv Detail & Related papers (2023-04-07T15:17:48Z)
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise [52.59444045853966]
We show that an entire family of generative models can be constructed by varying the choice of image degradation. The success of fully deterministic models calls into question the community's understanding of diffusion models.
arXiv Detail & Related papers (2022-08-19T15:18:39Z)
Modelling nonlinear dependencies in the latent space of inverse scattering [1.5990720051907859]
In inverse scattering proposed by Angles and Mallat, a deep neural network is trained to invert the scattering transform applied to an image. After such a network is trained, it can be used as a generative model given that we can sample from the distribution of principal components of scattering coefficients. Within this paper, two such models are explored, namely a Variational AutoEncoder and a Generative Adversarial Network.
arXiv Detail & Related papers (2022-03-19T12:07:43Z)
Global Context with Discrete Diffusion in Vector Quantised Modelling for Image Generation [19.156223720614186]
The integration of Vector Quantised Variational AutoEncoder with autoregressive models as generation part has yielded high-quality results on image generation. We show that with the help of a content-rich discrete visual codebook from VQ-VAE, the discrete diffusion model can also generate high fidelity images with global context.
arXiv Detail & Related papers (2021-12-03T09:09:34Z)
Argmax Flows and Multinomial Diffusion: Towards Non-Autoregressive Language Models [76.22217735434661]
This paper introduces two new classes of generative models for categorical data: Argmax Flows and Multinomial Diffusion. We demonstrate that our models perform competitively on language modelling and modelling of image segmentation maps.
arXiv Detail & Related papers (2021-02-10T11:04:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.