Related papers: Training and Inference on Any-Order Autoregressive Models the Right Way

Training and Inference on Any-Order Autoregressive Models the Right Way

URL: http://arxiv.org/abs/2205.13554v1
Date: Thu, 26 May 2022 18:00:02 GMT
Title: Training and Inference on Any-Order Autoregressive Models the Right Way
Authors: Andy Shih, Dorsa Sadigh, Stefano Ermon
Abstract summary: A family of Any-Order Autoregressive Models (AO-ARMs) has shown breakthrough performance in arbitrary conditional tasks. We identify significant improvements to be made to previous formulations of AO-ARMs. Our method leads to improved performance with no compromises on tractability.
Score: 97.39464776373902
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Conditional inference on arbitrary subsets of variables is a core problem in probabilistic inference with important applications such as masked language modeling and image inpainting. In recent years, the family of Any-Order Autoregressive Models (AO-ARMs) -- which includes popular models such as XLNet -- has shown breakthrough performance in arbitrary conditional tasks across a sweeping range of domains. But, in spite of their success, in this paper we identify significant improvements to be made to previous formulations of AO-ARMs. First, we show that AO-ARMs suffer from redundancy in their probabilistic model, i.e., they define the same distribution in multiple different ways. We alleviate this redundancy by training on a smaller set of univariate conditionals that still maintains support for efficient arbitrary conditional inference. Second, we upweight the training loss for univariate conditionals that are evaluated more frequently during inference. Our method leads to improved performance with no compromises on tractability, giving state-of-the-art likelihoods in arbitrary conditional modeling on text (Text8), image (CIFAR10, ImageNet32), and continuous tabular data domains.

Related papers

[MASK] is All You Need [28.90875822599164]
We propose using discrete-state models to connect Masked Generative and Non-autoregressive Diffusion models. By leveraging [MASK] in discrete-state models, we can bridge Masked Generative and Non-autoregressive Diffusion models.
arXiv Detail & Related papers (2024-12-09T18:59:56Z)
UTSD: Unified Time Series Diffusion Model [13.555837288440946]
A Unified Time Series Diffusion model is established for the first time to model the multi-domain probability distribution. We conduct extensive experiments on mainstream benchmarks, and the pre-trained UTSD outperforms existing foundation models on all data domains.
arXiv Detail & Related papers (2024-12-04T06:42:55Z)
On conditional diffusion models for PDE simulations [53.01911265639582]
We study score-based diffusion models for forecasting and assimilation of sparse observations. We propose an autoregressive sampling approach that significantly improves performance in forecasting. We also propose a new training strategy for conditional score-based models that achieves stable performance over a range of history lengths.
arXiv Detail & Related papers (2024-10-21T18:31:04Z)
Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference. Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable. We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z)
Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized Language Model Finetuning Using Shared Randomness [86.61582747039053]
Language model training in distributed settings is limited by the communication cost of exchanges. We extend recent work using shared randomness to perform distributed fine-tuning with low bandwidth.
arXiv Detail & Related papers (2023-06-16T17:59:51Z)
Maximum Likelihood on the Joint (Data, Condition) Distribution for Solving Ill-Posed Problems with Conditional Flow Models [0.0]
I describe a trick for training flow models using a prescribed rule as a surrogate for maximum likelihood. I demonstrate these properties on easily visualized toy problems, then use the method to successfully generate class-conditional images.
arXiv Detail & Related papers (2022-08-24T21:50:25Z)
Are conditional GANs explicitly conditional? [0.0]
This paper proposes two contributions for conditional Generative Adversarial Networks (cGANs) The first main contribution is an analysis of cGANs to show that they are not explicitly conditional. The second contribution is a new method, called acontrario, that explicitly models conditionality for both parts of the adversarial architecture.
arXiv Detail & Related papers (2021-06-28T22:49:27Z)
Uncertainty-aware Generalized Adaptive CycleGAN [44.34422859532988]
Unpaired image-to-image translation refers to learning inter-image-domain mapping in an unsupervised manner. Existing methods often learn deterministic mappings without explicitly modelling the robustness to outliers or predictive uncertainty. We propose a novel probabilistic method called Uncertainty-aware Generalized Adaptive Cycle Consistency (UGAC)
arXiv Detail & Related papers (2021-02-23T15:22:35Z)
Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores) For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training. We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z)
Polynomial-Time Exact MAP Inference on Discrete Models with Global Dependencies [83.05591911173332]
junction tree algorithm is the most general solution for exact MAP inference with run-time guarantees. We propose a new graph transformation technique via node cloning which ensures a run-time for solving our target problem independently of the form of a corresponding clique tree.
arXiv Detail & Related papers (2019-12-27T13:30:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.