Flexible and Efficient Contextual Bandits with Heterogeneous Treatment
Effect Oracle
- URL: http://arxiv.org/abs/2203.16668v1
- Date: Wed, 30 Mar 2022 20:43:43 GMT
- Title: Flexible and Efficient Contextual Bandits with Heterogeneous Treatment
Effect Oracle
- Authors: Aldo Gael Carranza, Sanath Kumar Krishnamurthy, Susan Athey
- Abstract summary: We design a statistically optimal and computationally efficient algorithm using heterogeneous treatment effect estimation oracles.
Our results provide the first universal reduction of contextual bandits to a general-purpose heterogeneous treatment effect estimation method.
We show that our approach is more robust to model misspecification than reward estimation methods based on squared error regression oracles.
- Score: 12.906249996227904
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many popular contextual bandit algorithms estimate reward models to inform
decision making. However, true rewards can contain action-independent
redundancies that are not relevant for decision making and only increase the
statistical complexity of accurate estimation. It is sufficient and more
data-efficient to estimate the simplest function that explains the reward
differences between actions, that is, the heterogeneous treatment effect,
commonly understood to be more structured and simpler than the reward.
Motivated by this observation, building on recent work on oracle-based
algorithms, we design a statistically optimal and computationally efficient
algorithm using heterogeneous treatment effect estimation oracles. Our results
provide the first universal reduction of contextual bandits to a
general-purpose heterogeneous treatment effect estimation method. We show that
our approach is more robust to model misspecification than reward estimation
methods based on squared error regression oracles. Experimentally, we show the
benefits of heterogeneous treatment effect estimation in contextual bandits
over reward estimation.
Related papers
- Estimating Distributional Treatment Effects in Randomized Experiments: Machine Learning for Variance Reduction [6.909352249236339]
We propose a novel regression adjustment method designed for estimating distributional treatment effect parameters in randomized experiments.
Our approach incorporates pre-treatment co-treatments into a distributional regression framework, utilizing machine learning techniques to improve the precision of distributional treatment effect estimators.
arXiv Detail & Related papers (2024-07-22T20:28:29Z) - Distributed High-Dimensional Quantile Regression: Estimation Efficiency and Support Recovery [0.0]
We focus on distributed estimation and support recovery for high-dimensional linear quantile regression.
We transform the original quantile regression into the least-squares optimization.
An efficient algorithm is developed, which enjoys high computation and communication efficiency.
arXiv Detail & Related papers (2024-05-13T08:32:22Z) - Efficient adjustment for complex covariates: Gaining efficiency with
DOPE [56.537164957672715]
We propose a framework that accommodates adjustment for any subset of information expressed by the covariates.
Based on our theoretical results, we propose the Debiased Outcome-adapted Propensity Estorimator (DOPE) for efficient estimation of the average treatment effect (ATE)
Our results show that the DOPE provides an efficient and robust methodology for ATE estimation in various observational settings.
arXiv Detail & Related papers (2024-02-20T13:02:51Z) - Benchmarking Bayesian Causal Discovery Methods for Downstream Treatment
Effect Estimation [137.3520153445413]
A notable gap exists in the evaluation of causal discovery methods, where insufficient emphasis is placed on downstream inference.
We evaluate seven established baseline causal discovery methods including a newly proposed method based on GFlowNets.
The results of our study demonstrate that some of the algorithms studied are able to effectively capture a wide range of useful and diverse ATE modes.
arXiv Detail & Related papers (2023-07-11T02:58:10Z) - B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under
Hidden Confounding [51.74479522965712]
We propose a meta-learner called the B-Learner, which can efficiently learn sharp bounds on the CATE function under limits on hidden confounding.
We prove its estimates are valid, sharp, efficient, and have a quasi-oracle property with respect to the constituent estimators under more general conditions than existing methods.
arXiv Detail & Related papers (2023-04-20T18:07:19Z) - Proximal Causal Learning of Conditional Average Treatment Effects [0.0]
We propose a tailored two-stage loss function for learning heterogeneous treatment effects.
Our proposed estimator can be implemented by off-the-shelf loss-minimizing machine learning methods.
arXiv Detail & Related papers (2023-01-26T02:56:36Z) - Assessment of Treatment Effect Estimators for Heavy-Tailed Data [70.72363097550483]
A central obstacle in the objective assessment of treatment effect (TE) estimators in randomized control trials (RCTs) is the lack of ground truth (or validation set) to test their performance.
We provide a novel cross-validation-like methodology to address this challenge.
We evaluate our methodology across 709 RCTs implemented in the Amazon supply chain.
arXiv Detail & Related papers (2021-12-14T17:53:01Z) - Learning from an Exploring Demonstrator: Optimal Reward Estimation for
Bandits [36.37578212532926]
We introduce the "inverse bandit" problem of estimating the rewards of a multi-armed bandit instance.
Existing approaches to the related problem of inverse reinforcement learning assume the execution of an optimal policy.
We develop simple and efficient reward estimation procedures for demonstrations within a class of upper-confidence-based algorithms.
arXiv Detail & Related papers (2021-06-28T17:37:49Z) - Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem.
Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem.
We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z) - Almost-Matching-Exactly for Treatment Effect Estimation under Network
Interference [73.23326654892963]
We propose a matching method that recovers direct treatment effects from randomized experiments where units are connected in an observed network.
Our method matches units almost exactly on counts of unique subgraphs within their neighborhood graphs.
arXiv Detail & Related papers (2020-03-02T15:21:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.