Autoregressive Diffusion Models
- URL: http://arxiv.org/abs/2110.02037v1
- Date: Tue, 5 Oct 2021 13:36:55 GMT
- Title: Autoregressive Diffusion Models
- Authors: Emiel Hoogeboom and Alexey A. Gritsenko and Jasmijn Bastings and Ben
Poole and Rianne van den Berg and Tim Salimans
- Abstract summary: We introduce Autoregressive Diffusion Models (ARDMs), a model class encompassing and generalizing order-agnostic autoregressive models.
ARDMs are simple to implement and easy to train, and can be trained using an efficient objective similar to modern probabilistic diffusion models.
We show that ARDMs obtain compelling results not only on complete datasets, but also on compressing single data points.
- Score: 34.125045462636386
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce Autoregressive Diffusion Models (ARDMs), a model class
encompassing and generalizing order-agnostic autoregressive models (Uria et
al., 2014) and absorbing discrete diffusion (Austin et al., 2021), which we
show are special cases of ARDMs under mild assumptions. ARDMs are simple to
implement and easy to train. Unlike standard ARMs, they do not require causal
masking of model representations, and can be trained using an efficient
objective similar to modern probabilistic diffusion models that scales
favourably to highly-dimensional data. At test time, ARDMs support parallel
generation which can be adapted to fit any given generation budget. We find
that ARDMs require significantly fewer steps than discrete diffusion models to
attain the same performance. Finally, we apply ARDMs to lossless compression,
and show that they are uniquely suited to this task. Contrary to existing
approaches based on bits-back coding, ARDMs obtain compelling results not only
on complete datasets, but also on compressing single data points. Moreover,
this can be done using a modest number of network calls for (de)compression due
to the model's adaptable parallel generation.
Related papers
- Constrained Diffusion Models via Dual Training [80.03953599062365]
Diffusion processes are prone to generating samples that reflect biases in a training dataset.
We develop constrained diffusion models by imposing diffusion constraints based on desired distributions.
We show that our constrained diffusion models generate new data from a mixture data distribution that achieves the optimal trade-off among objective and constraints.
arXiv Detail & Related papers (2024-08-27T14:25:42Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution [67.9215891673174]
We propose score entropy as a novel loss that naturally extends score matching to discrete spaces.
We test our Score Entropy Discrete Diffusion models on standard language modeling tasks.
arXiv Detail & Related papers (2023-10-25T17:59:12Z) - Mirror Diffusion Models for Constrained and Watermarked Generation [41.27274841596343]
Mirror Diffusion Models (MDM) is a new class of diffusion models that generate data on convex constrained sets without losing tractability.
For safety and privacy purposes, we also explore constrained sets as a new mechanism to embed invisible but quantitative information in generated data.
Our work brings new algorithmic opportunities for learning tractable diffusion on complex domains.
arXiv Detail & Related papers (2023-10-02T14:26:31Z) - Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging [24.64264715041198]
We introduce Sparse Model Soups (SMS), a novel method for merging sparse models by initiating each prune-retrain cycle with the averaged model from the previous phase.
SMS preserves sparsity, exploits sparse network benefits, is modular and fully parallelizable, and substantially improves IMP's performance.
arXiv Detail & Related papers (2023-06-29T08:49:41Z) - Deep Generative Modeling on Limited Data with Regularization by
Nontransferable Pre-trained Models [32.52492468276371]
We propose regularized deep generative model (Reg-DGM) to reduce the variance of generative modeling with limited data.
Reg-DGM uses a pre-trained model to optimize a weighted sum of a certain divergence and the expectation of an energy function.
Empirically, with various pre-trained feature extractors and a data-dependent energy function, Reg-DGM consistently improves the generation performance of strong DGMs with limited data.
arXiv Detail & Related papers (2022-08-30T10:28:50Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.