Evaluating the Robustness of Off-Policy Evaluation
- URL: http://arxiv.org/abs/2108.13703v1
- Date: Tue, 31 Aug 2021 09:33:13 GMT
- Title: Evaluating the Robustness of Off-Policy Evaluation
- Authors: Yuta Saito, Takuma Udagawa, Haruka Kiyohara, Kazuki Mogi, Yusuke
Narita, and Kei Tateno
- Abstract summary: Off-policy Evaluation (OPE) evaluates the performance of hypothetical policies leveraging only offline log data.
It is particularly useful in applications where the online interaction involves high stakes and expensive setting.
We develop Interpretable Evaluation for Offline Evaluation (IEOE), an experimental procedure to evaluate OPE estimators' robustness.
- Score: 10.760026478889664
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Off-policy Evaluation (OPE), or offline evaluation in general, evaluates the
performance of hypothetical policies leveraging only offline log data. It is
particularly useful in applications where the online interaction involves high
stakes and expensive setting such as precision medicine and recommender
systems. Since many OPE estimators have been proposed and some of them have
hyperparameters to be tuned, there is an emerging challenge for practitioners
to select and tune OPE estimators for their specific application.
Unfortunately, identifying a reliable estimator from results reported in
research papers is often difficult because the current experimental procedure
evaluates and compares the estimators' performance on a narrow set of
hyperparameters and evaluation policies. Therefore, it is difficult to know
which estimator is safe and reliable to use. In this work, we develop
Interpretable Evaluation for Offline Evaluation (IEOE), an experimental
procedure to evaluate OPE estimators' robustness to changes in hyperparameters
and/or evaluation policies in an interpretable manner. Then, using the IEOE
procedure, we perform extensive evaluation of a wide variety of existing
estimators on Open Bandit Dataset, a large-scale public real-world dataset for
OPE. We demonstrate that our procedure can evaluate the estimators' robustness
to the hyperparamter choice, helping us avoid using unsafe estimators. Finally,
we apply IEOE to real-world e-commerce platform data and demonstrate how to use
our protocol in practice.
Related papers
- Automated Off-Policy Estimator Selection via Supervised Learning [7.476028372444458]
Off-Policy Evaluation (OPE) problem consists of evaluating the performance of counterfactual policies with data collected by another one.
To solve the OPE problem, we resort to estimators, which aim to estimate in the most accurate way possible the performance that the counterfactual policies would have had if they were deployed in place of the logging policy.
We propose an automated data-driven OPE estimator selection method based on supervised learning.
arXiv Detail & Related papers (2024-06-26T02:34:48Z) - OPERA: Automatic Offline Policy Evaluation with Re-weighted Aggregates of Multiple Estimators [13.408838970377035]
offline policy evaluation (OPE) allows us to evaluate and estimate a new sequential decision-making policy's performance.
We propose a new algorithm that adaptively blends a set of OPE estimators given a dataset without relying on an explicit selection using a statistical procedure.
Our work contributes to improving ease of use for a general-purpose, estimator-agnostic, off-policy evaluation framework for offline RL.
arXiv Detail & Related papers (2024-05-27T23:51:20Z) - Hyperparameter Optimization Can Even be Harmful in Off-Policy Learning and How to Deal with It [20.312864152544954]
We show that naively applying an unbiased estimator of the generalization performance as a surrogate objective in HPO can cause an unexpected failure.
We propose simple and computationally efficient corrections to the typical HPO procedure to deal with the aforementioned issues simultaneously.
arXiv Detail & Related papers (2024-04-23T14:34:16Z) - When is Off-Policy Evaluation (Reward Modeling) Useful in Contextual Bandits? A Data-Centric Perspective [64.73162159837956]
evaluating the value of a hypothetical target policy with only a logged dataset is important but challenging.
We propose DataCOPE, a data-centric framework for evaluating a target policy given a dataset.
Our empirical analysis of DataCOPE in the logged contextual bandit settings using healthcare datasets confirms its ability to evaluate both machine-learning and human expert policies.
arXiv Detail & Related papers (2023-11-23T17:13:37Z) - From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing.
This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time.
We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - Uncertainty-Aware Instance Reweighting for Off-Policy Learning [63.31923483172859]
We propose a Uncertainty-aware Inverse Propensity Score estimator (UIPS) for improved off-policy learning.
Experiment results on synthetic and three real-world recommendation datasets demonstrate the advantageous sample efficiency of the proposed UIPS estimator.
arXiv Detail & Related papers (2023-03-11T11:42:26Z) - Policy-Adaptive Estimator Selection for Off-Policy Evaluation [12.1655494876088]
Off-policy evaluation (OPE) aims to accurately evaluate the performance of counterfactual policies using only offline logged data.
This paper studies this challenging problem of estimator selection for OPE for the first time.
In particular, we enable an estimator selection that is adaptive to a given OPE task, by appropriately subsampling available logged data and constructing pseudo policies.
arXiv Detail & Related papers (2022-11-25T05:31:42Z) - Off-policy evaluation for learning-to-rank via interpolating the
item-position model and the position-based model [83.83064559894989]
A critical need for industrial recommender systems is the ability to evaluate recommendation policies offline, before deploying them to production.
We develop a new estimator that mitigates the problems of the two most popular off-policy estimators for rankings.
In particular, the new estimator, called INTERPOL, addresses the bias of a potentially misspecified position-based model.
arXiv Detail & Related papers (2022-10-15T17:22:30Z) - Data-Driven Off-Policy Estimator Selection: An Application in User
Marketing on An Online Content Delivery Service [11.986224119327387]
Off-policy evaluation is essential in domains such as healthcare, marketing or recommender systems.
Many OPE methods with theoretical backgrounds have been proposed.
It is often unknown for practitioners which estimator to use for their specific applications and purposes.
arXiv Detail & Related papers (2021-09-17T15:53:53Z) - Control Variates for Slate Off-Policy Evaluation [112.35528337130118]
We study the problem of off-policy evaluation from batched contextual bandit data with multidimensional actions.
We obtain new estimators with risk improvement guarantees over both the PI and self-normalized PI estimators.
arXiv Detail & Related papers (2021-06-15T06:59:53Z) - Interpretable Off-Policy Evaluation in Reinforcement Learning by
Highlighting Influential Transitions [48.91284724066349]
Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education.
Traditional measures such as confidence intervals may be insufficient due to noise, limited data and confounding.
We develop a method that could serve as a hybrid human-AI system, to enable human experts to analyze the validity of policy evaluation estimates.
arXiv Detail & Related papers (2020-02-10T00:26:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.