A unified view of likelihood ratio and reparameterization gradients
- URL: http://arxiv.org/abs/2105.14900v1
- Date: Mon, 31 May 2021 11:53:08 GMT
- Title: A unified view of likelihood ratio and reparameterization gradients
- Authors: Paavo Parmas and Masashi Sugiyama
- Abstract summary: We use a first principles approach to explain that LR and RP are alternative methods of keeping track of the movement of probability mass.
We show that the space of all possible estimators combining LR and RP can be completely parameterized by a flow field.
We prove that there cannot exist a single-sample estimator of this type outside our space, thus, clarifying where we should be searching for better Monte Carlo gradient estimators.
- Score: 91.4645013545015
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Reparameterization (RP) and likelihood ratio (LR) gradient estimators are
used to estimate gradients of expectations throughout machine learning and
reinforcement learning; however, they are usually explained as simple
mathematical tricks, with no insight into their nature. We use a first
principles approach to explain that LR and RP are alternative methods of
keeping track of the movement of probability mass, and the two are connected
via the divergence theorem. Moreover, we show that the space of all possible
estimators combining LR and RP can be completely parameterized by a flow field
$u(x)$ and an importance sampling distribution $q(x)$. We prove that there
cannot exist a single-sample estimator of this type outside our characterized
space, thus, clarifying where we should be searching for better Monte Carlo
gradient estimators.
Related papers
- Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Model-Based Reparameterization Policy Gradient Methods: Theory and
Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics.
Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes.
We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z) - SIMPLE: A Gradient Estimator for $k$-Subset Sampling [42.38652558807518]
In this work, we fall back to discrete $k$-subset sampling on the forward pass.
We show that our gradient estimator, SIMPLE, exhibits lower bias and variance compared to state-of-the-art estimators.
Empirical results show improved performance on learning to explain and sparse linear regression.
arXiv Detail & Related papers (2022-10-04T22:33:16Z) - Neural Contextual Bandits via Reward-Biased Maximum Likelihood
Estimation [9.69596041242667]
Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the adaptive control literature for tackling explore-exploit trade-offs.
This paper studies the contextual bandit problem with general bounded reward functions and proposes NeuralRBMLE, which adapts the RBMLE principle by adding a bias term to the log-likelihood to enforce exploration.
We show that both algorithms achieve comparable or better empirical regrets than the state-of-the-art methods on real-world datasets with non-linear reward functions.
arXiv Detail & Related papers (2022-03-08T16:33:36Z) - Distribution Regression with Sliced Wasserstein Kernels [45.916342378789174]
We propose the first OT-based estimator for distribution regression.
We study the theoretical properties of a kernel ridge regression estimator based on such representation.
arXiv Detail & Related papers (2022-02-08T15:21:56Z) - A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms.
We develop a general framework from the perspective of Bregman minimization divergence.
We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Distributionally Robust Parametric Maximum Likelihood Estimation [13.09499764232737]
We propose a distributionally robust maximum likelihood estimator that minimizes the worst-case expected log-loss uniformly over a parametric nominal distribution.
Our novel robust estimator also enjoys statistical consistency and delivers promising empirical results in both regression and classification tasks.
arXiv Detail & Related papers (2020-10-11T19:05:49Z) - Variational Representations and Neural Network Estimation of R\'enyi
Divergences [4.2896536463351]
We derive a new variational formula for the R'enyi family of divergences, $R_alpha(Q|P)$, between probability measures $Q$ and $P$.
By applying this theory to neural-network estimators, we show that if a neural network family satisfies one of several strengthened versions of the universal approximation property then the corresponding R'enyi divergence estimator is consistent.
arXiv Detail & Related papers (2020-07-07T22:34:30Z) - Learning Minimax Estimators via Online Learning [55.92459567732491]
We consider the problem of designing minimax estimators for estimating parameters of a probability distribution.
We construct an algorithm for finding a mixed-case Nash equilibrium.
arXiv Detail & Related papers (2020-06-19T22:49:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.