Joint Stochastic Approximation and Its Application to Learning Discrete
Latent Variable Models
- URL: http://arxiv.org/abs/2005.14001v1
- Date: Thu, 28 May 2020 13:50:08 GMT
- Title: Joint Stochastic Approximation and Its Application to Learning Discrete
Latent Variable Models
- Authors: Zhijian Ou, Yunfu Song
- Abstract summary: We show that the difficulty of obtaining reliable gradients for the inference model and the drawback of indirectly optimizing the target log-likelihood can be gracefully addressed.
We propose to directly maximize the target log-likelihood and simultaneously minimize the inclusive divergence between the posterior and the inference model.
The resulting learning algorithm is called joint SA (JSA)
- Score: 19.07718284287928
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although with progress in introducing auxiliary amortized inference models,
learning discrete latent variable models is still challenging. In this paper,
we show that the annoying difficulty of obtaining reliable stochastic gradients
for the inference model and the drawback of indirectly optimizing the target
log-likelihood can be gracefully addressed in a new method based on stochastic
approximation (SA) theory of the Robbins-Monro type. Specifically, we propose
to directly maximize the target log-likelihood and simultaneously minimize the
inclusive divergence between the posterior and the inference model. The
resulting learning algorithm is called joint SA (JSA). To the best of our
knowledge, JSA represents the first method that couples an SA version of the EM
(expectation-maximization) algorithm (SAEM) with an adaptive MCMC procedure.
Experiments on several benchmark generative modeling and structured prediction
tasks show that JSA consistently outperforms recent competitive algorithms,
with faster convergence, better final likelihoods, and lower variance of
gradient estimates.
Related papers
- Supervised Score-Based Modeling by Gradient Boosting [49.556736252628745]
We propose a Supervised Score-based Model (SSM) which can be viewed as a gradient boosting algorithm combining score matching.
We provide a theoretical analysis of learning and sampling for SSM to balance inference time and prediction accuracy.
Our model outperforms existing models in both accuracy and inference time.
arXiv Detail & Related papers (2024-11-02T07:06:53Z) - Towards Stable Machine Learning Model Retraining via Slowly Varying Sequences [6.067007470552307]
We propose a methodology for finding sequences of machine learning models that are stable across retraining iterations.
We develop a mixed-integer optimization formulation that is guaranteed to recover optimal models.
Our method shows stronger stability than greedily trained models with a small, controllable sacrifice in predictive power.
arXiv Detail & Related papers (2024-03-28T22:45:38Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Temporal-Structure-Assisted Gradient Aggregation for Over-the-Air
Federated Edge Learning [24.248673415586413]
We introduce a Markovian probability model to characterize the intrinsic temporal structure of the model aggregation series.
We develop a message passing algorithm, termed temporal-structure-assisted gradient aggregation (TSA-GA), to fulfil this estimation task.
We show that the proposed TSAGA algorithm significantly outperforms the state-of-the-art, and is able to achieve comparable learning performance.
arXiv Detail & Related papers (2021-03-03T09:13:27Z) - Community Detection in the Stochastic Block Model by Mixed Integer
Programming [3.8073142980733]
Degree-Corrected Block Model (DCSBM) is a popular model to generate random graphs with community structure given an expected degree sequence.
Standard approach of community detection based on the DCSBM is to search for the model parameters that are the most likely to have produced the observed network data through maximum likelihood estimation (MLE)
We present mathematical programming formulations and exact solution methods that can provably find the model parameters and community assignments of maximum likelihood given an observed graph.
arXiv Detail & Related papers (2021-01-26T22:04:40Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z) - Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms.
We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework.
Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z) - BERT Loses Patience: Fast and Robust Inference with Early Exit [91.26199404912019]
We propose Patience-based Early Exit as a plug-and-play technique to improve the efficiency and robustness of a pretrained language model.
Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers.
arXiv Detail & Related papers (2020-06-07T13:38:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.