Kernel Conditional Moment Constraints for Confounding Robust Inference
- URL: http://arxiv.org/abs/2302.13348v2
- Date: Thu, 14 Sep 2023 17:31:59 GMT
- Title: Kernel Conditional Moment Constraints for Confounding Robust Inference
- Authors: Kei Ishikawa, Niao He
- Abstract summary: We study policy evaluation of offline contextual bandits subject to unobserved confounders.
We propose a general estimator that provides a sharp lower bound of the policy value.
- Score: 22.816690686310714
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study policy evaluation of offline contextual bandits subject to
unobserved confounders. Sensitivity analysis methods are commonly used to
estimate the policy value under the worst-case confounding over a given
uncertainty set. However, existing work often resorts to some coarse relaxation
of the uncertainty set for the sake of tractability, leading to overly
conservative estimation of the policy value. In this paper, we propose a
general estimator that provides a sharp lower bound of the policy value. It can
be shown that our estimator contains the recently proposed sharp estimator by
Dorn and Guo (2022) as a special case, and our method enables a novel extension
of the classical marginal sensitivity model using f-divergence. To construct
our estimator, we leverage the kernel method to obtain a tractable
approximation to the conditional moment constraints, which traditional
non-sharp estimators failed to take into account. In the theoretical analysis,
we provide a condition for the choice of the kernel which guarantees no
specification error that biases the lower bound estimation. Furthermore, we
provide consistency guarantees of policy evaluation and learning. In the
experiments with synthetic and real-world data, we demonstrate the
effectiveness of the proposed method.
Related papers
- A Convex Framework for Confounding Robust Inference [21.918894096307294]
We study policy evaluation of offline contextual bandits subject to unobserved confounders.
We propose a general estimator that provides a sharp lower bound of the policy value using convex programming.
arXiv Detail & Related papers (2023-09-21T19:45:37Z) - Conformal Off-Policy Prediction in Contextual Bandits [54.67508891852636]
Conformal off-policy prediction can output reliable predictive intervals for the outcome under a new target policy.
We provide theoretical finite-sample guarantees without making any additional assumptions beyond the standard contextual bandit setup.
arXiv Detail & Related papers (2022-06-09T10:39:33Z) - Minimax Off-Policy Evaluation for Multi-Armed Bandits [58.7013651350436]
We study the problem of off-policy evaluation in the multi-armed bandit model with bounded rewards.
We develop minimax rate-optimal procedures under three settings.
arXiv Detail & Related papers (2021-01-19T18:55:29Z) - Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy.
We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z) - CoinDICE: Off-Policy Confidence Interval Estimation [107.86876722777535]
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning.
We show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods.
arXiv Detail & Related papers (2020-10-22T12:39:11Z) - Statistical Bootstrapping for Uncertainty Estimation in Off-Policy
Evaluation [38.31971190670345]
We investigate the potential for statistical bootstrapping to be used as a way to produce calibrated confidence intervals for the true value of the policy.
We show that it can yield accurate confidence intervals in a variety of conditions, including challenging continuous control environments and small data regimes.
arXiv Detail & Related papers (2020-07-27T14:49:22Z) - Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic
Policies [80.42316902296832]
We study the estimation of policy value and gradient of a deterministic policy from off-policy data when actions are continuous.
In this setting, standard importance sampling and doubly robust estimators for policy value and gradient fail because the density ratio does not exist.
We propose several new doubly robust estimators based on different kernelization approaches.
arXiv Detail & Related papers (2020-06-06T15:52:05Z) - Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement
Learning [70.01650994156797]
Off- evaluation of sequential decision policies from observational data is necessary in batch reinforcement learning such as education healthcare.
We develop an approach that estimates the bounds of a given policy.
We prove convergence to the sharp bounds as we collect more confounded data.
arXiv Detail & Related papers (2020-02-11T16:18:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.