Variance Reduction for Score Functions Using Optimal Baselines
- URL: http://arxiv.org/abs/2212.13587v1
- Date: Tue, 27 Dec 2022 19:17:28 GMT
- Title: Variance Reduction for Score Functions Using Optimal Baselines
- Authors: Ronan Keane and H. Oliver Gao
- Abstract summary: This paper studies baselines, a variance reduction technique for score functions.
Motivated primarily by reinforcement learning, we derive for the first time an expression for the optimal state-dependent baseline.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many problems involve the use of models which learn probability distributions
or incorporate randomness in some way. In such problems, because computing the
true expected gradient may be intractable, a gradient estimator is used to
update the model parameters. When the model parameters directly affect a
probability distribution, the gradient estimator will involve score function
terms. This paper studies baselines, a variance reduction technique for score
functions. Motivated primarily by reinforcement learning, we derive for the
first time an expression for the optimal state-dependent baseline, the baseline
which results in a gradient estimator with minimum variance. Although we show
that there exist examples where the optimal baseline may be arbitrarily better
than a value function baseline, we find that the value function baseline
usually performs similarly to an optimal baseline in terms of variance
reduction. Moreover, the value function can also be used for bootstrapping
estimators of the return, leading to additional variance reduction. Our results
give new insight and justification for why value function baselines and the
generalized advantage estimator (GAE) work well in practice.
Related papers
- Accelerating Policy Gradient by Estimating Value Function from Prior
Computation in Deep Reinforcement Learning [16.999444076456268]
We investigate the use of prior computation to estimate the value function to improve sample efficiency in on-policy policy gradient methods.
In particular, we learn a new value function for the target task while combining it with a value estimate from the prior.
The resulting value function is used as a baseline in the policy gradient method.
arXiv Detail & Related papers (2023-02-02T20:23:22Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Statistical Optimality of Divide and Conquer Kernel-based Functional
Linear Regression [1.7227952883644062]
This paper studies the convergence performance of divide-and-conquer estimators in the scenario that the target function does not reside in the underlying kernel space.
As a decomposition-based scalable approach, the divide-and-conquer estimators of functional linear regression can substantially reduce the algorithmic complexities in time and memory.
arXiv Detail & Related papers (2022-11-20T12:29:06Z) - Data-Driven Influence Functions for Optimization-Based Causal Inference [105.5385525290466]
We study a constructive algorithm that approximates Gateaux derivatives for statistical functionals by finite differencing.
We study the case where probability distributions are not known a priori but need to be estimated from data.
arXiv Detail & Related papers (2022-08-29T16:16:22Z) - Gradient Estimation with Discrete Stein Operators [44.64146470394269]
We introduce a variance reduction technique based on Stein operators for discrete distributions.
Our technique achieves substantially lower variance than state-of-the-art estimators with the same number of function evaluations.
arXiv Detail & Related papers (2022-02-19T02:22:23Z) - Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient
for Out-of-Distribution Generalization [52.7137956951533]
We argue that devising simpler methods for learning predictors on existing features is a promising direction for future research.
We introduce Domain-Adjusted Regression (DARE), a convex objective for learning a linear predictor that is provably robust under a new model of distribution shift.
Under a natural model, we prove that the DARE solution is the minimax-optimal predictor for a constrained set of test distributions.
arXiv Detail & Related papers (2022-02-14T16:42:16Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient
Estimator [93.05919133288161]
We show that the variance of the straight-through variant of the popular Gumbel-Softmax estimator can be reduced through Rao-Blackwellization.
This provably reduces the mean squared error.
We empirically demonstrate that this leads to variance reduction, faster convergence, and generally improved performance in two unsupervised latent variable models.
arXiv Detail & Related papers (2020-10-09T22:54:38Z) - SUMO: Unbiased Estimation of Log Marginal Probability for Latent
Variable Models [80.22609163316459]
We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series.
We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost.
arXiv Detail & Related papers (2020-04-01T11:49:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.