Argmax Flows and Multinomial Diffusion: Towards Non-Autoregressive
Language Models
- URL: http://arxiv.org/abs/2102.05379v1
- Date: Wed, 10 Feb 2021 11:04:17 GMT
- Title: Argmax Flows and Multinomial Diffusion: Towards Non-Autoregressive
Language Models
- Authors: Emiel Hoogeboom, Didrik Nielsen, Priyank Jaini, Patrick Forr\'e, Max
Welling
- Abstract summary: This paper introduces two new classes of generative models for categorical data: Argmax Flows and Multinomial Diffusion.
We demonstrate that our models perform competitively on language modelling and modelling of image segmentation maps.
- Score: 76.22217735434661
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The field of language modelling has been largely dominated by autoregressive
models, for which sampling is inherently difficult to parallelize. This paper
introduces two new classes of generative models for categorical data such as
language or image segmentation: Argmax Flows and Multinomial Diffusion. Argmax
Flows are defined by a composition of a continuous distribution (such as a
normalizing flow), and an argmax function. To optimize this model, we learn a
probabilistic inverse for the argmax that lifts the categorical data to a
continuous space. Multinomial Diffusion gradually adds categorical noise in a
diffusion process, for which the generative denoising process is learned. We
demonstrate that our models perform competitively on language modelling and
modelling of image segmentation maps.
Related papers
- Autoregressive Image Generation without Vector Quantization [31.798754606008067]
Conventional wisdom holds that autoregressive models for image generation are typically accompanied by vector-quantized tokens.
We propose to model the per-token probability distribution using a diffusion procedure, which allows us to apply autoregressive models in a continuous-valued space.
arXiv Detail & Related papers (2024-06-17T17:59:58Z) - Simple and Effective Masked Diffusion Language Models [48.68198363304619]
We show that simple masked discrete diffusion is more performant than previously thought.
We apply an effective training recipe that improves the performance of masked diffusion models.
Our objective has a simple form -- it is a mixture of classical masked language modeling losses.
arXiv Detail & Related papers (2024-06-11T17:51:40Z) - FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models [56.71672127740099]
We focus on the task of image segmentation, which is traditionally solved by training models on closed-vocabulary datasets.
We leverage different and relatively small-sized, open-source foundation models for zero-shot open-vocabulary segmentation.
Our approach (dubbed FreeSeg-Diff), which does not rely on any training, outperforms many training-based approaches on both Pascal VOC and COCO datasets.
arXiv Detail & Related papers (2024-03-29T10:38:25Z) - Likelihood-Based Diffusion Language Models [13.916640262862215]
We take the first steps towards closing the likelihood gap between autoregressive and diffusion-based language models.
We pursue this goal through algorithmic improvements, scaling laws, and increased compute.
We release Plaid 1B, a large diffusion language model which outperforms GPT-2 124M in likelihood on benchmark datasets.
arXiv Detail & Related papers (2023-05-30T16:43:31Z) - Your Diffusion Model is Secretly a Zero-Shot Classifier [90.40799216880342]
We show that density estimates from large-scale text-to-image diffusion models can be leveraged to perform zero-shot classification.
Our generative approach to classification attains strong results on a variety of benchmarks.
Our results are a step toward using generative over discriminative models for downstream tasks.
arXiv Detail & Related papers (2023-03-28T17:59:56Z) - OCD: Learning to Overfit with Conditional Diffusion Models [95.1828574518325]
We present a dynamic model in which the weights are conditioned on an input sample x.
We learn to match those weights that would be obtained by finetuning a base model on x and its label y.
arXiv Detail & Related papers (2022-10-02T09:42:47Z) - Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise [52.59444045853966]
We show that an entire family of generative models can be constructed by varying the choice of image degradation.
The success of fully deterministic models calls into question the community's understanding of diffusion models.
arXiv Detail & Related papers (2022-08-19T15:18:39Z) - Structured Denoising Diffusion Models in Discrete State-Spaces [15.488176444698404]
We introduce Discrete Denoising Diffusion Probabilistic Models (D3PMs) for discrete data.
The choice of transition matrix is an important design decision that leads to improved results in image and text domains.
For text, this model class achieves strong results on character-level text generation while scaling to large vocabularies on LM1B.
arXiv Detail & Related papers (2021-07-07T04:11:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.