A Convex Framework for Confounding Robust Inference
- URL: http://arxiv.org/abs/2309.12450v2
- Date: Wed, 1 Nov 2023 17:25:53 GMT
- Title: A Convex Framework for Confounding Robust Inference
- Authors: Kei Ishikawa, Niao He, Takafumi Kanamori
- Abstract summary: We study policy evaluation of offline contextual bandits subject to unobserved confounders.
We propose a general estimator that provides a sharp lower bound of the policy value using convex programming.
- Score: 21.918894096307294
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study policy evaluation of offline contextual bandits subject to
unobserved confounders. Sensitivity analysis methods are commonly used to
estimate the policy value under the worst-case confounding over a given
uncertainty set. However, existing work often resorts to some coarse relaxation
of the uncertainty set for the sake of tractability, leading to overly
conservative estimation of the policy value. In this paper, we propose a
general estimator that provides a sharp lower bound of the policy value using
convex programming. The generality of our estimator enables various extensions
such as sensitivity analysis with f-divergence, model selection with cross
validation and information criterion, and robust policy learning with the sharp
lower bound. Furthermore, our estimation method can be reformulated as an
empirical risk minimization problem thanks to the strong duality, which enables
us to provide strong theoretical guarantees of the proposed estimator using
techniques of the M-estimation.
Related papers
- Predictive Performance Comparison of Decision Policies Under Confounding [32.21041697921289]
We propose a method to compare the predictive performance of decision policies under a variety of modern identification approaches.
Key to our method is the insight that there are regions of uncertainty that we can safely ignore in the policy comparison.
arXiv Detail & Related papers (2024-04-01T01:27:07Z) - Distributionally Robust Policy Evaluation under General Covariate Shift in Contextual Bandits [31.571978291138866]
We introduce a distributionally robust approach that enhances the reliability of offline policy evaluation in contextual bandits.
Our method aims to deliver robust policy evaluation results in the presence of discrepancies in both context and policy distribution.
arXiv Detail & Related papers (2024-01-21T00:42:06Z) - Kernel Conditional Moment Constraints for Confounding Robust Inference [22.816690686310714]
We study policy evaluation of offline contextual bandits subject to unobserved confounders.
We propose a general estimator that provides a sharp lower bound of the policy value.
arXiv Detail & Related papers (2023-02-26T16:44:13Z) - Improved Policy Evaluation for Randomized Trials of Algorithmic Resource
Allocation [54.72195809248172]
We present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT.
We prove theoretically that such an estimator is more accurate than common estimators based on sample means.
arXiv Detail & Related papers (2023-02-06T05:17:22Z) - Conformal Off-Policy Prediction in Contextual Bandits [54.67508891852636]
Conformal off-policy prediction can output reliable predictive intervals for the outcome under a new target policy.
We provide theoretical finite-sample guarantees without making any additional assumptions beyond the standard contextual bandit setup.
arXiv Detail & Related papers (2022-06-09T10:39:33Z) - Minimax Off-Policy Evaluation for Multi-Armed Bandits [58.7013651350436]
We study the problem of off-policy evaluation in the multi-armed bandit model with bounded rewards.
We develop minimax rate-optimal procedures under three settings.
arXiv Detail & Related papers (2021-01-19T18:55:29Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy.
We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z) - Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement
Learning [70.01650994156797]
Off- evaluation of sequential decision policies from observational data is necessary in batch reinforcement learning such as education healthcare.
We develop an approach that estimates the bounds of a given policy.
We prove convergence to the sharp bounds as we collect more confounded data.
arXiv Detail & Related papers (2020-02-11T16:18:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.