Leveraging Factored Action Spaces for Off-Policy Evaluation
- URL: http://arxiv.org/abs/2307.07014v1
- Date: Thu, 13 Jul 2023 18:34:14 GMT
- Title: Leveraging Factored Action Spaces for Off-Policy Evaluation
- Authors: Aaman Rebello (1), Shengpu Tang (2), Jenna Wiens (2), Sonali Parbhoo
(1) ((1) Department of Engineering, Imperial College London, (2) Division of
Computer Science & Engineering, University of Michigan)
- Abstract summary: Off-policy evaluation (OPE) aims to estimate the benefit of following a counterfactual sequence of actions.
Existing OPE estimators often exhibit high bias and high variance in problems involving large, decomposed action spaces.
We propose a new family of "decomposed" importance sampling (IS) estimators based on factored action spaces.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Off-policy evaluation (OPE) aims to estimate the benefit of following a
counterfactual sequence of actions, given data collected from executed
sequences. However, existing OPE estimators often exhibit high bias and high
variance in problems involving large, combinatorial action spaces. We
investigate how to mitigate this issue using factored action spaces i.e.
expressing each action as a combination of independent sub-actions from smaller
action spaces. This approach facilitates a finer-grained analysis of how
actions differ in their effects. In this work, we propose a new family of
"decomposed" importance sampling (IS) estimators based on factored action
spaces. Given certain assumptions on the underlying problem structure, we prove
that the decomposed IS estimators have less variance than their original
non-decomposed versions, while preserving the property of zero bias. Through
simulations, we empirically verify our theoretical results, probing the
validity of various assumptions. Provided with a technique that can derive the
action space factorisation for a given problem, our work shows that OPE can be
improved "for free" by utilising this inherent problem structure.
Related papers
- Exogenous Matching: Learning Good Proposals for Tractable Counterfactual Estimation [1.9662978733004601]
We propose an importance sampling method for tractable and efficient estimation of counterfactual expressions.
By minimizing a common upper bound of counterfactual estimators, we transform the variance minimization problem into a conditional distribution learning problem.
We validate the theoretical results through experiments under various types and settings of Structural Causal Models (SCMs) and demonstrate the outperformance on counterfactual estimation tasks.
arXiv Detail & Related papers (2024-10-17T03:08:28Z) - Benchmarking Bayesian Causal Discovery Methods for Downstream Treatment
Effect Estimation [137.3520153445413]
A notable gap exists in the evaluation of causal discovery methods, where insufficient emphasis is placed on downstream inference.
We evaluate seven established baseline causal discovery methods including a newly proposed method based on GFlowNets.
The results of our study demonstrate that some of the algorithms studied are able to effectively capture a wide range of useful and diverse ATE modes.
arXiv Detail & Related papers (2023-07-11T02:58:10Z) - Doubly Robust Kernel Statistics for Testing Distributional Treatment
Effects [18.791409397894835]
We build upon a previously introduced framework, Counterfactual Mean Embeddings, for representing causal distributions within Reproducing Kernel Hilbert Spaces (RKHS)
These improved estimators are inspired by doubly robust estimators of the causal mean, using a similar form within the kernel space.
This leads to new permutation based tests for distributional causal effects, using the estimators we propose as tests statistics.
arXiv Detail & Related papers (2022-12-09T15:32:19Z) - Markovian Interference in Experiments [7.426870925611945]
We consider experiments in dynamical systems where interventions on some experimental units impact other units through a limiting constraint.
Despite outsize practical importance, the best estimators for this problem are largely in nature, and their bias is not well understood.
Off-policy estimators, while unbiased, apparently incur a large penalty in variance relative to state-of-the-art alternatives.
We introduce an on-policy estimator: the Differences-In-Q's (DQ) estimator.
arXiv Detail & Related papers (2022-06-06T05:53:36Z) - Off-Policy Evaluation for Large Action Spaces via Embeddings [36.42838320396534]
Off-policy evaluation (OPE) in contextual bandits has seen rapid adoption in real-world systems.
Existing OPE estimators degrade severely when the number of actions is large.
We propose a new OPE estimator that leverages marginalized importance weights when action embeddings provide structure in the action space.
arXiv Detail & Related papers (2022-02-13T14:00:09Z) - Deconfounding Scores: Feature Representations for Causal Effect
Estimation with Weak Overlap [140.98628848491146]
We introduce deconfounding scores, which induce better overlap without biasing the target of estimation.
We show that deconfounding scores satisfy a zero-covariance condition that is identifiable in observed data.
In particular, we show that this technique could be an attractive alternative to standard regularizations.
arXiv Detail & Related papers (2021-04-12T18:50:11Z) - Causal Inference Under Unmeasured Confounding With Negative Controls: A
Minimax Learning Approach [84.29777236590674]
We study the estimation of causal parameters when not all confounders are observed and instead negative controls are available.
Recent work has shown how these can enable identification and efficient estimation via two so-called bridge functions.
arXiv Detail & Related papers (2021-03-25T17:59:19Z) - Optimal Off-Policy Evaluation from Multiple Logging Policies [77.62012545592233]
We study off-policy evaluation from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling.
We find the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one.
arXiv Detail & Related papers (2020-10-21T13:43:48Z) - Almost-Matching-Exactly for Treatment Effect Estimation under Network
Interference [73.23326654892963]
We propose a matching method that recovers direct treatment effects from randomized experiments where units are connected in an observed network.
Our method matches units almost exactly on counts of unique subgraphs within their neighborhood graphs.
arXiv Detail & Related papers (2020-03-02T15:21:20Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.