Training and Inference on Any-Order Autoregressive Models the Right Way
- URL: http://arxiv.org/abs/2205.13554v1
- Date: Thu, 26 May 2022 18:00:02 GMT
- Title: Training and Inference on Any-Order Autoregressive Models the Right Way
- Authors: Andy Shih, Dorsa Sadigh, Stefano Ermon
- Abstract summary: A family of Any-Order Autoregressive Models (AO-ARMs) has shown breakthrough performance in arbitrary conditional tasks.
We identify significant improvements to be made to previous formulations of AO-ARMs.
Our method leads to improved performance with no compromises on tractability.
- Score: 97.39464776373902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conditional inference on arbitrary subsets of variables is a core problem in
probabilistic inference with important applications such as masked language
modeling and image inpainting. In recent years, the family of Any-Order
Autoregressive Models (AO-ARMs) -- which includes popular models such as XLNet
-- has shown breakthrough performance in arbitrary conditional tasks across a
sweeping range of domains. But, in spite of their success, in this paper we
identify significant improvements to be made to previous formulations of
AO-ARMs. First, we show that AO-ARMs suffer from redundancy in their
probabilistic model, i.e., they define the same distribution in multiple
different ways. We alleviate this redundancy by training on a smaller set of
univariate conditionals that still maintains support for efficient arbitrary
conditional inference. Second, we upweight the training loss for univariate
conditionals that are evaluated more frequently during inference. Our method
leads to improved performance with no compromises on tractability, giving
state-of-the-art likelihoods in arbitrary conditional modeling on text (Text8),
image (CIFAR10, ImageNet32), and continuous tabular data domains.
Related papers
- On conditional diffusion models for PDE simulations [53.01911265639582]
We study score-based diffusion models for forecasting and assimilation of sparse observations.
We propose an autoregressive sampling approach that significantly improves performance in forecasting.
We also propose a new training strategy for conditional score-based models that achieves stable performance over a range of history lengths.
arXiv Detail & Related papers (2024-10-21T18:31:04Z) - Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference.
Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable.
We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z) - Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized
Language Model Finetuning Using Shared Randomness [86.61582747039053]
Language model training in distributed settings is limited by the communication cost of exchanges.
We extend recent work using shared randomness to perform distributed fine-tuning with low bandwidth.
arXiv Detail & Related papers (2023-06-16T17:59:51Z) - Maximum Likelihood on the Joint (Data, Condition) Distribution for
Solving Ill-Posed Problems with Conditional Flow Models [0.0]
I describe a trick for training flow models using a prescribed rule as a surrogate for maximum likelihood.
I demonstrate these properties on easily visualized toy problems, then use the method to successfully generate class-conditional images.
arXiv Detail & Related papers (2022-08-24T21:50:25Z) - Are conditional GANs explicitly conditional? [0.0]
This paper proposes two contributions for conditional Generative Adversarial Networks (cGANs)
The first main contribution is an analysis of cGANs to show that they are not explicitly conditional.
The second contribution is a new method, called acontrario, that explicitly models conditionality for both parts of the adversarial architecture.
arXiv Detail & Related papers (2021-06-28T22:49:27Z) - Uncertainty-aware Generalized Adaptive CycleGAN [44.34422859532988]
Unpaired image-to-image translation refers to learning inter-image-domain mapping in an unsupervised manner.
Existing methods often learn deterministic mappings without explicitly modelling the robustness to outliers or predictive uncertainty.
We propose a novel probabilistic method called Uncertainty-aware Generalized Adaptive Cycle Consistency (UGAC)
arXiv Detail & Related papers (2021-02-23T15:22:35Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - Polynomial-Time Exact MAP Inference on Discrete Models with Global
Dependencies [83.05591911173332]
junction tree algorithm is the most general solution for exact MAP inference with run-time guarantees.
We propose a new graph transformation technique via node cloning which ensures a run-time for solving our target problem independently of the form of a corresponding clique tree.
arXiv Detail & Related papers (2019-12-27T13:30:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.