Improved Policy Evaluation for Randomized Trials of Algorithmic Resource
Allocation
- URL: http://arxiv.org/abs/2302.02570v1
- Date: Mon, 6 Feb 2023 05:17:22 GMT
- Title: Improved Policy Evaluation for Randomized Trials of Algorithmic Resource
Allocation
- Authors: Aditya Mate, Bryan Wilder, Aparna Taneja, Milind Tambe
- Abstract summary: We present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT.
We prove theoretically that such an estimator is more accurate than common estimators based on sample means.
- Score: 54.72195809248172
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the task of evaluating policies of algorithmic resource
allocation through randomized controlled trials (RCTs). Such policies are
tasked with optimizing the utilization of limited intervention resources, with
the goal of maximizing the benefits derived. Evaluation of such allocation
policies through RCTs proves difficult, notwithstanding the scale of the trial,
because the individuals' outcomes are inextricably interlinked through resource
constraints controlling the policy decisions. Our key contribution is to
present a new estimator leveraging our proposed novel concept, that involves
retrospective reshuffling of participants across experimental arms at the end
of an RCT. We identify conditions under which such reassignments are
permissible and can be leveraged to construct counterfactual trials, whose
outcomes can be accurately ascertained, for free. We prove theoretically that
such an estimator is more accurate than common estimators based on sample means
-- we show that it returns an unbiased estimate and simultaneously reduces
variance. We demonstrate the value of our approach through empirical
experiments on synthetic, semi-synthetic as well as real case study data and
show improved estimation accuracy across the board.
Related papers
- Evaluating the Effectiveness of Index-Based Treatment Allocation [42.040099398176665]
When resources are scarce, an allocation policy is needed to decide who receives a resource.
This paper introduces methods to evaluate index-based allocation policies using data from a randomized control trial.
arXiv Detail & Related papers (2024-02-19T01:55:55Z) - Marginal Density Ratio for Off-Policy Evaluation in Contextual Bandits [41.91108406329159]
Off-Policy Evaluation (OPE) in contextual bandits is crucial for assessing new policies using existing data without costly experimentation.
We introduce a new OPE estimator for contextual bandits, the Marginal Ratio (MR) estimator, which focuses on the shift in the marginal distribution of outcomes $Y$ instead of the policies themselves.
arXiv Detail & Related papers (2023-12-03T17:04:57Z) - RCT Rejection Sampling for Causal Estimation Evaluation [25.845034753006367]
Confounding is a significant obstacle to unbiased estimation of causal effects from observational data.
We build on a promising empirical evaluation strategy that simplifies evaluation design and uses real data.
We show our algorithm indeed results in low bias when oracle estimators are evaluated on confounded samples.
arXiv Detail & Related papers (2023-07-27T20:11:07Z) - Uncertainty-Aware Instance Reweighting for Off-Policy Learning [63.31923483172859]
We propose a Uncertainty-aware Inverse Propensity Score estimator (UIPS) for improved off-policy learning.
Experiment results on synthetic and three real-world recommendation datasets demonstrate the advantageous sample efficiency of the proposed UIPS estimator.
arXiv Detail & Related papers (2023-03-11T11:42:26Z) - Optimal Treatment Regimes for Proximal Causal Learning [7.672587258250301]
We propose a novel optimal individualized treatment regime based on outcome and treatment confounding bridges.
We show that the value function of this new optimal treatment regime is superior to that of existing ones in the literature.
arXiv Detail & Related papers (2022-12-19T14:29:25Z) - A Risk-Sensitive Approach to Policy Optimization [21.684251937825234]
Standard deep reinforcement learning (DRL) aims to maximize expected reward, considering collected experiences equally in formulating a policy.
We propose a more direct approach whereby risk-sensitive objectives, specified in terms of the cumulative distribution function (CDF) of the distribution of full-episode rewards, are optimized.
We demonstrate that the use of moderately "pessimistic" risk profiles, which emphasize scenarios where the agent performs poorly, leads to enhanced exploration and a continual focus on addressing deficiencies.
arXiv Detail & Related papers (2022-08-19T00:55:05Z) - Assessment of Treatment Effect Estimators for Heavy-Tailed Data [70.72363097550483]
A central obstacle in the objective assessment of treatment effect (TE) estimators in randomized control trials (RCTs) is the lack of ground truth (or validation set) to test their performance.
We provide a novel cross-validation-like methodology to address this challenge.
We evaluate our methodology across 709 RCTs implemented in the Amazon supply chain.
arXiv Detail & Related papers (2021-12-14T17:53:01Z) - Post-Contextual-Bandit Inference [57.88785630755165]
Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking.
They can both improve outcomes for study participants and increase the chance of identifying good or even best policies.
To support credible inference on novel interventions at the end of the study, we still want to construct valid confidence intervals on average treatment effects, subgroup effects, or value of new policies.
arXiv Detail & Related papers (2021-06-01T12:01:51Z) - Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy.
We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z) - Optimal Off-Policy Evaluation from Multiple Logging Policies [77.62012545592233]
We study off-policy evaluation from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling.
We find the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one.
arXiv Detail & Related papers (2020-10-21T13:43:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.