Distributional Shift-Aware Off-Policy Interval Estimation: A Unified
Error Quantification Framework
- URL: http://arxiv.org/abs/2309.13278v2
- Date: Mon, 2 Oct 2023 00:41:01 GMT
- Title: Distributional Shift-Aware Off-Policy Interval Estimation: A Unified
Error Quantification Framework
- Authors: Wenzhuo Zhou, Yuhan Li, Ruoqing Zhu, Annie Qu
- Abstract summary: We study high-confidence off-policy evaluation in the context of infinite-horizon Markov decision processes.
The objective is to establish a confidence interval (CI) for the target policy value using only offline data pre-collected from unknown behavior policies.
We show that our algorithm is sample-efficient, error-robust, and provably convergent even in non-linear function approximation settings.
- Score: 8.572441599469597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study high-confidence off-policy evaluation in the context of
infinite-horizon Markov decision processes, where the objective is to establish
a confidence interval (CI) for the target policy value using only offline data
pre-collected from unknown behavior policies. This task faces two primary
challenges: providing a comprehensive and rigorous error quantification in CI
estimation, and addressing the distributional shift that results from
discrepancies between the distribution induced by the target policy and the
offline data-generating process. Motivated by an innovative unified error
analysis, we jointly quantify the two sources of estimation errors: the
misspecification error on modeling marginalized importance weights and the
statistical uncertainty due to sampling, within a single interval. This unified
framework reveals a previously hidden tradeoff between the errors, which
undermines the tightness of the CI. Relying on a carefully designed
discriminator function, the proposed estimator achieves a dual purpose:
breaking the curse of the tradeoff to attain the tightest possible CI, and
adapting the CI to ensure robustness against distributional shifts. Our method
is applicable to time-dependent data without assuming any weak dependence
conditions via leveraging a local supermartingale/martingale structure.
Theoretically, we show that our algorithm is sample-efficient, error-robust,
and provably convergent even in non-linear function approximation settings. The
numerical performance of the proposed method is examined in synthetic datasets
and an OhioT1DM mobile health study.
Related papers
- Distributionally robust risk evaluation with an isotonic constraint [20.74502777102024]
Distributionally robust learning aims to control the worst-case statistical performance within an uncertainty set of candidate distributions.
We propose a shape-constrained approach to DRL, which incorporates prior information about the way in which the unknown target distribution differs from its estimate.
Empirical studies on both synthetic and real data examples demonstrate the improved accuracy of the proposed shape-constrained approach.
arXiv Detail & Related papers (2024-07-09T13:56:34Z) - Optimal Aggregation of Prediction Intervals under Unsupervised Domain Shift [9.387706860375461]
A distribution shift occurs when the underlying data-generating process changes, leading to a deviation in the model's performance.
The prediction interval serves as a crucial tool for characterizing uncertainties induced by their underlying distribution.
We propose methodologies for aggregating prediction intervals to obtain one with minimal width and adequate coverage on the target domain.
arXiv Detail & Related papers (2024-05-16T17:55:42Z) - Robustness and Accuracy Could Be Reconcilable by (Proper) Definition [109.62614226793833]
The trade-off between robustness and accuracy has been widely studied in the adversarial literature.
We find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance.
By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty.
arXiv Detail & Related papers (2022-02-21T10:36:09Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - CoinDICE: Off-Policy Confidence Interval Estimation [107.86876722777535]
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning.
We show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods.
arXiv Detail & Related papers (2020-10-22T12:39:11Z) - Learning Calibrated Uncertainties for Domain Shift: A Distributionally
Robust Learning Approach [150.8920602230832]
We propose a framework for learning calibrated uncertainties under domain shifts.
In particular, the density ratio estimation reflects the closeness of a target (test) sample to the source (training) distribution.
We show that our proposed method generates calibrated uncertainties that benefit downstream tasks.
arXiv Detail & Related papers (2020-10-08T02:10:54Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - GenDICE: Generalized Offline Estimation of Stationary Values [108.17309783125398]
We show that effective estimation can still be achieved in important applications.
Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions.
The resulting algorithm, GenDICE, is straightforward and effective.
arXiv Detail & Related papers (2020-02-21T00:27:52Z) - Estimating Uncertainty Intervals from Collaborating Networks [15.467208581231848]
We propose a novel method to capture predictive distributions in regression by defining two neural networks with two distinct loss functions.
Specifically, one network approximates the cumulative distribution function, and the second network approximates its inverse.
We benchmark CN against several common approaches on two synthetic and six real-world datasets, including forecasting A1c values in diabetic patients from electronic health records.
arXiv Detail & Related papers (2020-02-12T20:10:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.