Entropy Regularization for Population Estimation
- URL: http://arxiv.org/abs/2208.11747v1
- Date: Wed, 24 Aug 2022 19:17:39 GMT
- Title: Entropy Regularization for Population Estimation
- Authors: Ben Chugg, Peter Henderson, Jacob Goldin, Daniel E. Ho
- Abstract summary: Mean reward estimation tasks have been shown to be essential for public policy settings.
We show that leveraging entropy and KL divergence can yield a better trade-off between reward and estimator variance than existing baselines.
- Score: 3.0175479520609887
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Entropy regularization is known to improve exploration in sequential
decision-making problems. We show that this same mechanism can also lead to
nearly unbiased and lower-variance estimates of the mean reward in the
optimize-and-estimate structured bandit setting. Mean reward estimation (i.e.,
population estimation) tasks have recently been shown to be essential for
public policy settings where legal constraints often require precise estimates
of population metrics. We show that leveraging entropy and KL divergence can
yield a better trade-off between reward and estimator variance than existing
baselines, all while remaining nearly unbiased. These properties of entropy
regularization illustrate an exciting potential for bridging the optimal
exploration and estimation literatures.
Related papers
- Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Statistical Barriers to Affine-equivariant Estimation [10.077727846124633]
We investigate the quantitative performance of affine-equivariant estimators for robust mean estimation.
We find that classical estimators are either quantitatively sub-optimal or lack any quantitative guarantees.
We construct a new affine-equivariant estimator which nearly matches our lower bound.
arXiv Detail & Related papers (2023-10-16T18:42:00Z) - Regions of Reliability in the Evaluation of Multivariate Probabilistic
Forecasts [73.33395097728128]
We provide the first systematic finite-sample study of proper scoring rules for time-series forecasting evaluation.
We carry out our analysis on a comprehensive synthetic benchmark, specifically designed to test several key discrepancies between ground-truth and forecast distributions.
arXiv Detail & Related papers (2023-04-19T17:38:42Z) - SOPE: Spectrum of Off-Policy Estimators [40.15700429288981]
We show the existence of a spectrum of estimators whose endpoints are SIS and IS.
We provide empirical evidence that estimators in this spectrum can be used to trade-off between the bias and variance of IS and SIS.
arXiv Detail & Related papers (2021-11-06T18:29:21Z) - Universal Off-Policy Evaluation [64.02853483874334]
We take the first steps towards a universal off-policy estimator (UnO)
We use UnO for estimating and simultaneously bounding the mean, variance, quantiles/median, inter-quantile range, CVaR, and the entire cumulative distribution of returns.
arXiv Detail & Related papers (2021-04-26T18:54:31Z) - Off-Policy Evaluation via the Regularized Lagrangian [110.28927184857478]
Recently proposed distribution correction estimation (DICE) family of estimators has advanced the state of the art in off-policy evaluation from behavior-agnostic data.
In this paper, we unify these estimators as regularized Lagrangians of the same linear program.
We find that dual solutions offer greater flexibility in navigating the tradeoff between stability and estimation bias, and generally provide superior estimates in practice.
arXiv Detail & Related papers (2020-07-07T13:45:56Z) - Batch Stationary Distribution Estimation [98.18201132095066]
We consider the problem of approximating the stationary distribution of an ergodic Markov chain given a set of sampled transitions.
We propose a consistent estimator that is based on recovering a correction ratio function over the given data.
arXiv Detail & Related papers (2020-03-02T09:10:01Z) - GenDICE: Generalized Offline Estimation of Stationary Values [108.17309783125398]
We show that effective estimation can still be achieved in important applications.
Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions.
The resulting algorithm, GenDICE, is straightforward and effective.
arXiv Detail & Related papers (2020-02-21T00:27:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.