Causal Reinforcement Learning: An Instrumental Variable Approach
- URL: http://arxiv.org/abs/2103.04021v1
- Date: Sat, 6 Mar 2021 03:57:46 GMT
- Title: Causal Reinforcement Learning: An Instrumental Variable Approach
- Authors: Jin Li and Ye Luo and Xiaowei Zhang
- Abstract summary: We show that the dynamic interaction between data generation and data analysis leads to a new type of bias -- reinforcement bias -- that exacerbates the endogeneity problem in standard data analysis.
A key contribution of the paper is the development of new techniques that allow for the analysis of the algorithms in general settings where noises feature time-dependency.
- Score: 8.881788084913147
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In the standard data analysis framework, data is first collected (once for
all), and then data analysis is carried out. With the advancement of digital
technology, decisionmakers constantly analyze past data and generate new data
through the decisions they make. In this paper, we model this as a Markov
decision process and show that the dynamic interaction between data generation
and data analysis leads to a new type of bias -- reinforcement bias -- that
exacerbates the endogeneity problem in standard data analysis.
We propose a class of instrument variable (IV)-based reinforcement learning
(RL) algorithms to correct for the bias and establish their asymptotic
properties by incorporating them into a two-timescale stochastic approximation
framework. A key contribution of the paper is the development of new techniques
that allow for the analysis of the algorithms in general settings where noises
feature time-dependency.
We use the techniques to derive sharper results on finite-time trajectory
stability bounds: with a polynomial rate, the entire future trajectory of the
iterates from the algorithm fall within a ball that is centered at the true
parameter and is shrinking at a (different) polynomial rate. We also use the
technique to provide formulas for inferences that are rarely done for RL
algorithms. These formulas highlight how the strength of the IV and the degree
of the noise's time dependency affect the inference.
Related papers
- Geometry-Aware Instrumental Variable Regression [56.16884466478886]
We propose a transport-based IV estimator that takes into account the geometry of the data manifold through data-derivative information.
We provide a simple plug-and-play implementation of our method that performs on par with related estimators in standard settings.
arXiv Detail & Related papers (2024-05-19T17:49:33Z) - An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models [20.314426291330278]
In traditional statistical learning, data points are usually assumed to be independently and identically distributed (i.i.d.)
This paper presents a contrasting viewpoint, perceiving data points as interconnected and employing a Markov reward process (MRP) for data modeling.
We reformulate the typical supervised learning as an on-policy policy evaluation problem within reinforcement learning (RL), introducing a generalized temporal difference (TD) learning algorithm as a resolution.
arXiv Detail & Related papers (2024-04-23T21:02:58Z) - Tackling Computational Heterogeneity in FL: A Few Theoretical Insights [68.8204255655161]
We introduce and analyse a novel aggregation framework that allows for formalizing and tackling computational heterogeneous data.
Proposed aggregation algorithms are extensively analyzed from a theoretical, and an experimental prospective.
arXiv Detail & Related papers (2023-07-12T16:28:21Z) - DRFLM: Distributionally Robust Federated Learning with Inter-client
Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data.
We propose a general framework to solve the above two challenges simultaneously.
We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z) - Towards Data-Algorithm Dependent Generalization: a Case Study on
Overparameterized Linear Regression [19.047997113063147]
We introduce a notion called data-algorithm compatibility, which considers the generalization behavior of the entire data-dependent training trajectory.
We perform a data-dependent trajectory analysis and derive a sufficient condition for compatibility in such a setting.
arXiv Detail & Related papers (2022-02-12T12:42:36Z) - Reinforcement Learning with Heterogeneous Data: Estimation and Inference [84.72174994749305]
We introduce the K-Heterogeneous Markov Decision Process (K-Hetero MDP) to address sequential decision problems with population heterogeneity.
We propose the Auto-Clustered Policy Evaluation (ACPE) for estimating the value of a given policy, and the Auto-Clustered Policy Iteration (ACPI) for estimating the optimal policy in a given policy class.
We present simulations to support our theoretical findings, and we conduct an empirical study on the standard MIMIC-III dataset.
arXiv Detail & Related papers (2022-01-31T20:58:47Z) - A Priori Denoising Strategies for Sparse Identification of Nonlinear
Dynamical Systems: A Comparative Study [68.8204255655161]
We investigate and compare the performance of several local and global smoothing techniques to a priori denoise the state measurements.
We show that, in general, global methods, which use the entire measurement data set, outperform local methods, which employ a neighboring data subset around a local point.
arXiv Detail & Related papers (2022-01-29T23:31:25Z) - Scalable Intervention Target Estimation in Linear Models [52.60799340056917]
Current approaches to causal structure learning either work with known intervention targets or use hypothesis testing to discover the unknown intervention targets.
This paper proposes a scalable and efficient algorithm that consistently identifies all intervention targets.
The proposed algorithm can be used to also update a given observational Markov equivalence class into the interventional Markov equivalence class.
arXiv Detail & Related papers (2021-11-15T03:16:56Z) - Dynamic Selection in Algorithmic Decision-making [9.172670955429906]
This paper identifies and addresses dynamic selection problems in online learning algorithms with endogenous data.
A novel bias (self-fulfilling bias) arises because the endogeneity of the data influences the choices of decisions.
We propose an instrumental-variable-based algorithm to correct for the bias.
arXiv Detail & Related papers (2021-08-28T01:41:37Z) - Scalable Quasi-Bayesian Inference for Instrumental Variable Regression [40.33643110066981]
We present a scalable quasi-Bayesian procedure for IV regression, building upon the recently developed kernelized IV models.
Our approach does not require additional assumptions on the data generating process, and leads to a scalable approximate inference algorithm with time cost comparable to the corresponding point estimation methods.
arXiv Detail & Related papers (2021-06-16T12:52:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.