Markovian Interference in Experiments
- URL: http://arxiv.org/abs/2206.02371v2
- Date: Thu, 9 Jun 2022 14:13:38 GMT
- Title: Markovian Interference in Experiments
- Authors: Vivek F. Farias, Andrew A. Li, Tianyi Peng, Andrew Zheng
- Abstract summary: We consider experiments in dynamical systems where interventions on some experimental units impact other units through a limiting constraint.
Despite outsize practical importance, the best estimators for this problem are largely in nature, and their bias is not well understood.
Off-policy estimators, while unbiased, apparently incur a large penalty in variance relative to state-of-the-art alternatives.
We introduce an on-policy estimator: the Differences-In-Q's (DQ) estimator.
- Score: 7.426870925611945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider experiments in dynamical systems where interventions on some
experimental units impact other units through a limiting constraint (such as a
limited inventory). Despite outsize practical importance, the best estimators
for this `Markovian' interference problem are largely heuristic in nature, and
their bias is not well understood. We formalize the problem of inference in
such experiments as one of policy evaluation. Off-policy estimators, while
unbiased, apparently incur a large penalty in variance relative to
state-of-the-art heuristics. We introduce an on-policy estimator: the
Differences-In-Q's (DQ) estimator. We show that the DQ estimator can in general
have exponentially smaller variance than off-policy evaluation. At the same
time, its bias is second order in the impact of the intervention. This yields a
striking bias-variance tradeoff so that the DQ estimator effectively dominates
state-of-the-art alternatives. From a theoretical perspective, we introduce
three separate novel techniques that are of independent interest in the theory
of Reinforcement Learning (RL). Our empirical evaluation includes a set of
experiments on a city-scale ride-hailing simulator.
Related papers
- Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Doubly Robust Estimator for Off-Policy Evaluation with Large Action
Spaces [0.951828574518325]
We study Off-Policy Evaluation in contextual bandit settings with large action spaces.
benchmark estimators suffer from severe bias and variance tradeoffs.
We propose a Marginalized Doubly Robust (MDR) estimator to overcome these limitations.
arXiv Detail & Related papers (2023-08-07T10:00:07Z) - Leveraging Factored Action Spaces for Off-Policy Evaluation [0.0]
Off-policy evaluation (OPE) aims to estimate the benefit of following a counterfactual sequence of actions.
Existing OPE estimators often exhibit high bias and high variance in problems involving large, decomposed action spaces.
We propose a new family of "decomposed" importance sampling (IS) estimators based on factored action spaces.
arXiv Detail & Related papers (2023-07-13T18:34:14Z) - Correcting for Interference in Experiments: A Case Study at Douyin [9.586075896428177]
Interference is a ubiquitous problem in experiments conducted on two-sided content marketplaces, such as Douyin (China's analog of TikTok)
We introduce a novel Monte-Carlo estimator, based on "Differences-in-Qs" (DQ) techniques, which achieves bias that is second-order in the treatment effect, while remaining sample-efficient to estimate.
We implement our estimator on Douyin's experimentation platform, and in the process develop DQ into a truly "plug-and-play" estimator for interference in real-world settings.
arXiv Detail & Related papers (2023-05-04T04:30:30Z) - Quantile Off-Policy Evaluation via Deep Conditional Generative Learning [21.448553360543478]
Off-Policy evaluation (OPE) is concerned with evaluating a new target policy using offline data generated by a potentially different behavior policy.
We propose a doubly-robust inference procedure for quantile OPE in sequential decision making.
We demonstrate the advantages of this proposed estimator through both simulations and a real-world dataset from a short-video platform.
arXiv Detail & Related papers (2022-12-29T22:01:43Z) - Neighborhood Adaptive Estimators for Causal Inference under Network
Interference [152.4519491244279]
We consider the violation of the classical no-interference assumption, meaning that the treatment of one individuals might affect the outcomes of another.
To make interference tractable, we consider a known network that describes how interference may travel.
We study estimators for the average direct treatment effect on the treated in such a setting.
arXiv Detail & Related papers (2022-12-07T14:53:47Z) - Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem.
We examine the performance of various debiasing methods across multiple tasks.
We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Understanding the Under-Coverage Bias in Uncertainty Estimation [58.03725169462616]
quantile regression tends to emphunder-cover than the desired coverage level in reality.
We prove that quantile regression suffers from an inherent under-coverage bias.
Our theory reveals that this under-coverage bias stems from a certain high-dimensional parameter estimation error.
arXiv Detail & Related papers (2021-06-10T06:11:55Z) - Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with
Latent Confounders [62.54431888432302]
We study an OPE problem in an infinite-horizon, ergodic Markov decision process with unobserved confounders.
We show how, given only a latent variable model for states and actions, policy value can be identified from off-policy data.
arXiv Detail & Related papers (2020-07-27T22:19:01Z) - Valid Causal Inference with (Some) Invalid Instruments [24.794879633855373]
We show how to perform consistent IV estimation despite violations of the exclusion assumption.
We achieve accurate estimates of conditional average treatment effects using an ensemble of deep network-based estimators.
arXiv Detail & Related papers (2020-06-19T21:09:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.