Training and Inference on Any-Order Autoregressive Models the Right Way
- URL: http://arxiv.org/abs/2205.13554v1
- Date: Thu, 26 May 2022 18:00:02 GMT
- Title: Training and Inference on Any-Order Autoregressive Models the Right Way
- Authors: Andy Shih, Dorsa Sadigh, Stefano Ermon
- Abstract summary: A family of Any-Order Autoregressive Models (AO-ARMs) has shown breakthrough performance in arbitrary conditional tasks.
We identify significant improvements to be made to previous formulations of AO-ARMs.
Our method leads to improved performance with no compromises on tractability.
- Score: 97.39464776373902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conditional inference on arbitrary subsets of variables is a core problem in
probabilistic inference with important applications such as masked language
modeling and image inpainting. In recent years, the family of Any-Order
Autoregressive Models (AO-ARMs) -- which includes popular models such as XLNet
-- has shown breakthrough performance in arbitrary conditional tasks across a
sweeping range of domains. But, in spite of their success, in this paper we
identify significant improvements to be made to previous formulations of
AO-ARMs. First, we show that AO-ARMs suffer from redundancy in their
probabilistic model, i.e., they define the same distribution in multiple
different ways. We alleviate this redundancy by training on a smaller set of
univariate conditionals that still maintains support for efficient arbitrary
conditional inference. Second, we upweight the training loss for univariate
conditionals that are evaluated more frequently during inference. Our method
leads to improved performance with no compromises on tractability, giving
state-of-the-art likelihoods in arbitrary conditional modeling on text (Text8),
image (CIFAR10, ImageNet32), and continuous tabular data domains.
Related papers
- Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized
Language Model Finetuning Using Shared Randomness [86.61582747039053]
Language model training in distributed settings is limited by the communication cost of exchanges.
We extend recent work using shared randomness to perform distributed fine-tuning with low bandwidth.
arXiv Detail & Related papers (2023-06-16T17:59:51Z) - Maximum Likelihood on the Joint (Data, Condition) Distribution for
Solving Ill-Posed Problems with Conditional Flow Models [0.0]
I describe a trick for training flow models using a prescribed rule as a surrogate for maximum likelihood.
I demonstrate these properties on easily visualized toy problems, then use the method to successfully generate class-conditional images.
arXiv Detail & Related papers (2022-08-24T21:50:25Z) - Image Generation with Multimodal Priors using Denoising Diffusion
Probabilistic Models [54.1843419649895]
A major challenge in using generative models to accomplish this task is the lack of paired data containing all modalities and corresponding outputs.
We propose a solution based on a denoising diffusion probabilistic synthesis models to generate images under multi-model priors.
arXiv Detail & Related papers (2022-06-10T12:23:05Z) - Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions.
We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts.
We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z) - Are conditional GANs explicitly conditional? [0.0]
This paper proposes two contributions for conditional Generative Adversarial Networks (cGANs)
The first main contribution is an analysis of cGANs to show that they are not explicitly conditional.
The second contribution is a new method, called acontrario, that explicitly models conditionality for both parts of the adversarial architecture.
arXiv Detail & Related papers (2021-06-28T22:49:27Z) - Uncertainty-aware Generalized Adaptive CycleGAN [44.34422859532988]
Unpaired image-to-image translation refers to learning inter-image-domain mapping in an unsupervised manner.
Existing methods often learn deterministic mappings without explicitly modelling the robustness to outliers or predictive uncertainty.
We propose a novel probabilistic method called Uncertainty-aware Generalized Adaptive Cycle Consistency (UGAC)
arXiv Detail & Related papers (2021-02-23T15:22:35Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - Polynomial-Time Exact MAP Inference on Discrete Models with Global
Dependencies [83.05591911173332]
junction tree algorithm is the most general solution for exact MAP inference with run-time guarantees.
We propose a new graph transformation technique via node cloning which ensures a run-time for solving our target problem independently of the form of a corresponding clique tree.
arXiv Detail & Related papers (2019-12-27T13:30:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.