Safe Exploration for Efficient Policy Evaluation and Comparison
- URL: http://arxiv.org/abs/2202.13234v1
- Date: Sat, 26 Feb 2022 21:41:44 GMT
- Title: Safe Exploration for Efficient Policy Evaluation and Comparison
- Authors: Runzhe Wan, Branislav Kveton, Rui Song
- Abstract summary: We study efficient and safe data collection for bandit policy evaluation.
For each variant, we analyze its statistical properties, derive the corresponding exploration policy, and design an efficient algorithm for computing it.
- Score: 20.97686379166058
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-quality data plays a central role in ensuring the accuracy of policy
evaluation. This paper initiates the study of efficient and safe data
collection for bandit policy evaluation. We formulate the problem and
investigate its several representative variants. For each variant, we analyze
its statistical properties, derive the corresponding exploration policy, and
design an efficient algorithm for computing it. Both theoretical analysis and
experiments support the usefulness of the proposed methods.
Related papers
- A Review of Global Sensitivity Analysis Methods and a comparative case study on Digit Classification [5.458813674116228]
Global sensitivity analysis (GSA) aims to detect influential input factors that lead to a model to arrive at a certain decision.
We provide a comprehensive review and a comparison on global sensitivity analysis methods.
arXiv Detail & Related papers (2024-06-23T00:38:19Z) - Policy Gradient with Active Importance Sampling [55.112959067035916]
Policy gradient (PG) methods significantly benefit from IS, enabling the effective reuse of previously collected samples.
However, IS is employed in RL as a passive tool for re-weighting historical samples.
We look for the best behavioral policy from which to collect samples to reduce the policy gradient variance.
arXiv Detail & Related papers (2024-05-09T09:08:09Z) - From Variability to Stability: Advancing RecSys Benchmarking Practices [3.3331198926331784]
This paper introduces a novel benchmarking methodology to facilitate a fair and robust comparison of RecSys algorithms.
By utilizing a diverse set of $30$ open datasets, including two introduced in this work, we critically examine the influence of dataset characteristics on algorithm performance.
arXiv Detail & Related papers (2024-02-15T07:35:52Z) - Positivity-free Policy Learning with Observational Data [8.293758599118618]
This study introduces a novel positivity-free (stochastic) policy learning framework.
We propose incremental propensity score policies to adjust propensity score values instead of assigning fixed values to treatments.
This paper provides a thorough exploration of the theoretical guarantees associated with policy learning and validates the proposed framework's finite-sample performance.
arXiv Detail & Related papers (2023-10-10T19:47:27Z) - Uncertainty-Aware Instance Reweighting for Off-Policy Learning [63.31923483172859]
We propose a Uncertainty-aware Inverse Propensity Score estimator (UIPS) for improved off-policy learning.
Experiment results on synthetic and three real-world recommendation datasets demonstrate the advantageous sample efficiency of the proposed UIPS estimator.
arXiv Detail & Related papers (2023-03-11T11:42:26Z) - Reinforcement Learning with Heterogeneous Data: Estimation and Inference [84.72174994749305]
We introduce the K-Heterogeneous Markov Decision Process (K-Hetero MDP) to address sequential decision problems with population heterogeneity.
We propose the Auto-Clustered Policy Evaluation (ACPE) for estimating the value of a given policy, and the Auto-Clustered Policy Iteration (ACPI) for estimating the optimal policy in a given policy class.
We present simulations to support our theoretical findings, and we conduct an empirical study on the standard MIMIC-III dataset.
arXiv Detail & Related papers (2022-01-31T20:58:47Z) - SAMBA: Safe Model-Based & Active Reinforcement Learning [59.01424351231993]
SAMBA is a framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics.
We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations.
We provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
arXiv Detail & Related papers (2020-06-12T10:40:46Z) - Distributionally Robust Batch Contextual Bandits [20.667213458836734]
Policy learning using historical observational data is an important problem that has found widespread applications.
Existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment.
In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data.
arXiv Detail & Related papers (2020-06-10T03:11:40Z) - Adaptive Estimator Selection for Off-Policy Evaluation [48.66170976187225]
We develop a generic data-driven method for estimator selection in off-policy policy evaluation settings.
We establish a strong performance guarantee for the method, showing that it is competitive with the oracle estimator, up to a constant factor.
arXiv Detail & Related papers (2020-02-18T16:57:42Z) - Efficient Policy Learning from Surrogate-Loss Classification Reductions [65.91730154730905]
We consider the estimation problem given by a weighted surrogate-loss classification reduction of policy learning.
We show that, under a correct specification assumption, the weighted classification formulation need not be efficient for policy parameters.
We propose an estimation approach based on generalized method of moments, which is efficient for the policy parameters.
arXiv Detail & Related papers (2020-02-12T18:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.