Debiasing In-Sample Policy Performance for Small-Data, Large-Scale
Optimization
- URL: http://arxiv.org/abs/2107.12438v2
- Date: Wed, 28 Jul 2021 15:39:31 GMT
- Title: Debiasing In-Sample Policy Performance for Small-Data, Large-Scale
Optimization
- Authors: Vishal Gupta, Michael Huang, Paat Rusmevichientong
- Abstract summary: We propose a novel estimator of the out-of-sample performance of a policy in data-driven optimization.
Unlike cross-validation, our approach avoids sacrificing data for a test set.
We prove our estimator performs well in the small-data, largescale regime.
- Score: 4.554894288663752
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivated by the poor performance of cross-validation in settings where data
are scarce, we propose a novel estimator of the out-of-sample performance of a
policy in data-driven optimization.Our approach exploits the optimization
problem's sensitivity analysis to estimate the gradient of the optimal
objective value with respect to the amount of noise in the data and uses the
estimated gradient to debias the policy's in-sample performance. Unlike
cross-validation techniques, our approach avoids sacrificing data for a test
set, utilizes all data when training and, hence, is well-suited to settings
where data are scarce. We prove bounds on the bias and variance of our
estimator for optimization problems with uncertain linear objectives but known,
potentially non-convex, feasible regions. For more specialized optimization
problems where the feasible region is "weakly-coupled" in a certain sense, we
prove stronger results. Specifically, we provide explicit high-probability
bounds on the error of our estimator that hold uniformly over a policy class
and depends on the problem's dimension and policy class's complexity. Our
bounds show that under mild conditions, the error of our estimator vanishes as
the dimension of the optimization problem grows, even if the amount of
available data remains small and constant. Said differently, we prove our
estimator performs well in the small-data, large-scale regime. Finally, we
numerically compare our proposed method to state-of-the-art approaches through
a case-study on dispatching emergency medical response services using real
data. Our method provides more accurate estimates of out-of-sample performance
and learns better-performing policies.
Related papers
- Reward-Augmented Data Enhances Direct Preference Alignment of LLMs [56.24431208419858]
We introduce reward-conditioned Large Language Models (LLMs) that learn from the entire spectrum of response quality within the dataset.
We propose an effective yet simple data relabeling method that conditions the preference pairs on quality scores to construct a reward-augmented dataset.
arXiv Detail & Related papers (2024-10-10T16:01:51Z) - Data-Driven Estimation of Conditional Expectations, Application to Optimal Stopping and Reinforcement Learning [2.1756081703276]
We propose simple and purely data-driven means for estimating directly the desired conditional expectation.
Because conditional expectations appear in the description of a number of optimization problems with the corresponding optimal solution, we extend our data-driven method to cover such cases as well.
We test our methodology by applying it to Optimal Stopping and Optimal Action Policy in Reinforcement Learning.
arXiv Detail & Related papers (2024-07-18T05:57:30Z) - Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - A Finite-Horizon Approach to Active Level Set Estimation [0.7366405857677227]
We consider the problem of active learning in the context of spatial sampling for level set estimation (LSE)
We present a finite-horizon search procedure to perform LSE in one dimension while optimally balancing both the final estimation error and the distance traveled for a fixed number of samples.
We show that the resulting optimization problem can be solved in closed form and that the resulting policy generalizes existing approaches to this problem.
arXiv Detail & Related papers (2023-10-18T14:11:41Z) - Optimize-via-Predict: Realizing out-of-sample optimality in data-driven
optimization [0.0]
We examine a formulation for data-driven optimization wherein the decision-maker is not privy to the true distribution.
We define a prescriptive solution as a decisionvendor rule mapping such a data set to decisions.
We present an optimization problem that would solve for such an out-of-sample optimal solution, and does so efficiently by a combination of sampling and bisection search algorithms.
arXiv Detail & Related papers (2023-09-20T08:48:50Z) - Optimizer's Information Criterion: Dissecting and Correcting Bias in Data-Driven Optimization [16.57676001669012]
In data-driven optimization, the sample performance of the obtained decision typically incurs an optimistic bias against the true performance.
Common techniques to correct this bias, such as cross-validation, require repeatedly solving additional optimization problems and are therefore expensive.
We develop a general bias correction approach that directly approximates the first-order bias and does not require solving any additional optimization problems.
arXiv Detail & Related papers (2023-06-16T07:07:58Z) - Uncertainty-Aware Instance Reweighting for Off-Policy Learning [63.31923483172859]
We propose a Uncertainty-aware Inverse Propensity Score estimator (UIPS) for improved off-policy learning.
Experiment results on synthetic and three real-world recommendation datasets demonstrate the advantageous sample efficiency of the proposed UIPS estimator.
arXiv Detail & Related papers (2023-03-11T11:42:26Z) - Control Variates for Slate Off-Policy Evaluation [112.35528337130118]
We study the problem of off-policy evaluation from batched contextual bandit data with multidimensional actions.
We obtain new estimators with risk improvement guarantees over both the PI and self-normalized PI estimators.
arXiv Detail & Related papers (2021-06-15T06:59:53Z) - Sparse Feature Selection Makes Batch Reinforcement Learning More Sample
Efficient [62.24615324523435]
This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation.
When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware methods can make batch RL more sample efficient.
arXiv Detail & Related papers (2020-11-08T16:48:02Z) - Evaluating representations by the complexity of learning low-loss
predictors [55.94170724668857]
We consider the problem of evaluating representations of data for use in solving a downstream task.
We propose to measure the quality of a representation by the complexity of learning a predictor on top of the representation that achieves low loss on a task of interest.
arXiv Detail & Related papers (2020-09-15T22:06:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.