Related papers: Post Reinforcement Learning Inference

Post Reinforcement Learning Inference

URL: http://arxiv.org/abs/2302.08854v3
Date: Sat, 11 May 2024 03:31:18 GMT
Title: Post Reinforcement Learning Inference
Authors: Vasilis Syrgkanis, Ruohan Zhan,
Abstract summary: We consider estimation and inference using data collected from reinforcement learning algorithms. We propose a weighted Z-estimation approach with carefully designed adaptive weights to stabilize the time-varying variance. Primary applications include dynamic treatment effect estimation and dynamic off-policy evaluation.
Score: 22.117487428829488
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider estimation and inference using data collected from reinforcement learning algorithms. These algorithms, characterized by their adaptive experimentation, interact with individual units over multiple stages, dynamically adjusting their strategies based on previous interactions. Our goal is to evaluate a counterfactual policy post-data collection and estimate structural parameters, like dynamic treatment effects, which can be used for credit assignment and determining the effect of earlier actions on final outcomes. Such parameters of interest can be framed as solutions to moment equations, but not minimizers of a population loss function, leading to Z-estimation approaches for static data. However, in the adaptive data collection environment of reinforcement learning, where algorithms deploy nonstationary behavior policies, standard estimators do not achieve asymptotic normality due to the fluctuating variance. We propose a weighted Z-estimation approach with carefully designed adaptive weights to stabilize the time-varying estimation variance. We identify proper weighting schemes to restore the consistency and asymptotic normality of the weighted Z-estimators for target parameters, which allows for hypothesis testing and constructing uniform confidence regions. Primary applications include dynamic treatment effect estimation and dynamic off-policy evaluation.

Related papers

Semiparametric Counterfactual Regression [2.356908851188234]
We propose a doubly robust-style estimator for counterfactual regression within a generalizable framework. Our approach uses incremental interventions to enhance adaptability while maintaining with standard methods. Our analysis shows that the proposed estimators can achieve $sqrn$-consistency and normality for a broad class of problems.
arXiv Detail & Related papers (2025-04-03T15:32:26Z)
Embedding generalization within the learning dynamics: An approach based-on sample path large deviation theory [0.0]
We consider an empirical risk perturbation based learning problem that exploits methods from continuous-time perspective. We provide an estimate in the small noise limit based on the Freidlin-Wentzell theory of large deviations. We also present a computational algorithm that solves the corresponding variational problem leading to an optimal point estimates.
arXiv Detail & Related papers (2024-08-04T23:31:35Z)
C-Learner: Constrained Learning for Causal Inference and Semiparametric Statistics [5.395560682099634]
We propose a novel debiased estimator that achieves stable plug-in estimates with desirable properties. Our constrained learning framework solves for the best plug-in estimator under the constraint that the first-order error with respect to the plugged-in quantity is zero. Our estimator outperforms one-step estimation and targeting in challenging settings with limited overlap between treatment and control, and performs comparably otherwise.
arXiv Detail & Related papers (2024-05-15T16:38:28Z)
Targeted Machine Learning for Average Causal Effect Estimation Using the Front-Door Functional [3.0232957374216953]
evaluating the average causal effect (ACE) of a treatment on an outcome often involves overcoming the challenges posed by confounding factors in observational studies. Here, we introduce novel estimation strategies for the front-door criterion based on the targeted minimum loss-based estimation theory. We demonstrate the applicability of these estimators to analyze the effect of early stage academic performance on future yearly income.
arXiv Detail & Related papers (2023-12-15T22:04:53Z)
Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences. Our method is especially suitable for problems with well-specified likelihoods. We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z)
A Semiparametric Instrumented Difference-in-Differences Approach to Policy Learning [2.1989182578668243]
We propose a general instrumented difference-in-differences (DiD) approach for learning the optimal treatment policy. Specifically, we establish identification results using a binary instrumental variable (IV) when the parallel trends assumption fails to hold. We also construct a Wald estimator, novel inverse probability estimators, and a class of semi efficient and multiply robust estimators.
arXiv Detail & Related papers (2023-10-14T09:38:32Z)
Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation. We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z)
Post-Contextual-Bandit Inference [57.88785630755165]
Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking. They can both improve outcomes for study participants and increase the chance of identifying good or even best policies. To support credible inference on novel interventions at the end of the study, we still want to construct valid confidence intervals on average treatment effects, subgroup effects, or value of new policies.
arXiv Detail & Related papers (2021-06-01T12:01:51Z)
Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy. We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z)
CoinDICE: Off-Policy Confidence Interval Estimation [107.86876722777535]
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning. We show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods.
arXiv Detail & Related papers (2020-10-22T12:39:11Z)
GenDICE: Generalized Offline Estimation of Stationary Values [108.17309783125398]
We show that effective estimation can still be achieved in important applications. Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions. The resulting algorithm, GenDICE, is straightforward and effective.
arXiv Detail & Related papers (2020-02-21T00:27:52Z)
Double/Debiased Machine Learning for Dynamic Treatment Effects via g-Estimation [25.610534178373065]
We consider the estimation of treatment effects in settings when multiple treatments are assigned over time. We propose an extension of the double/debiased machine learning framework to estimate the dynamic effects of treatments.
arXiv Detail & Related papers (2020-02-17T22:32:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.