Instrumental Variable Value Iteration for Causal Offline Reinforcement
Learning
- URL: http://arxiv.org/abs/2102.09907v1
- Date: Fri, 19 Feb 2021 13:01:40 GMT
- Title: Instrumental Variable Value Iteration for Causal Offline Reinforcement
Learning
- Authors: Luofeng Liao, Zuyue Fu, Zhuoran Yang, Mladen Kolar, Zhaoran Wang
- Abstract summary: In offline reinforcement learning (RL) an optimal policy is learnt solely from a priori collected observational data.
We study a confounded Markov decision process where the transition dynamics admit an additive nonlinear functional form.
We propose a provably efficient IV-aided Value Iteration (IVVI) algorithm based on a primal-dual reformulation of CMR.
- Score: 94.70124304098469
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In offline reinforcement learning (RL) an optimal policy is learnt solely
from a priori collected observational data. However, in observational data,
actions are often confounded by unobserved variables. Instrumental variables
(IVs), in the context of RL, are the variables whose influence on the state
variables are all mediated through the action. When a valid instrument is
present, we can recover the confounded transition dynamics through
observational data. We study a confounded Markov decision process where the
transition dynamics admit an additive nonlinear functional form. Using IVs, we
derive a conditional moment restriction (CMR) through which we can identify
transition dynamics based on observational data. We propose a provably
efficient IV-aided Value Iteration (IVVI) algorithm based on a primal-dual
reformulation of CMR. To the best of our knowledge, this is the first provably
efficient algorithm for instrument-aided offline RL.
Related papers
- Geometry-Aware Instrumental Variable Regression [56.16884466478886]
We propose a transport-based IV estimator that takes into account the geometry of the data manifold through data-derivative information.
We provide a simple plug-and-play implementation of our method that performs on par with related estimators in standard settings.
arXiv Detail & Related papers (2024-05-19T17:49:33Z) - Learning Decision Policies with Instrumental Variables through Double Machine Learning [16.842233444365764]
A common issue in learning decision-making policies in data-rich settings is spurious correlations in the offline dataset.
We propose DML-IV, a non-linear IV regression method that reduces the bias in two-stage IV regressions.
It outperforms state-of-the-art IV regression methods on IV regression benchmarks and learns high-performing policies in the presence of instruments.
arXiv Detail & Related papers (2024-05-14T10:55:04Z) - Regularized DeepIV with Model Selection [72.17508967124081]
Regularized DeepIV (RDIV) regression can converge to the least-norm IV solution.
Our method matches the current state-of-the-art convergence rate.
arXiv Detail & Related papers (2024-03-07T05:38:56Z) - Understanding, Predicting and Better Resolving Q-Value Divergence in
Offline-RL [86.0987896274354]
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL.
We then propose a novel Self-Excite Eigenvalue Measure (SEEM) metric to measure the evolving property of Q-network at training.
For the first time, our theory can reliably decide whether the training will diverge at an early stage.
arXiv Detail & Related papers (2023-10-06T17:57:44Z) - Causal Inference with Conditional Instruments using Deep Generative
Models [21.771832598942677]
A standard IV is expected to be related to the treatment variable and independent of all other variables in the system.
conditional IV (CIV) method has been proposed to allow a variable to be an instrument conditioning on a set of variables.
We propose to learn the representations of a CIV and its conditioning set from data with latent confounders for average causal effect estimation.
arXiv Detail & Related papers (2022-11-29T14:31:54Z) - Ancestral Instrument Method for Causal Inference without Complete
Knowledge [0.0]
Unobserved confounding is the main obstacle to causal effect estimation from observational data.
Conditional IVs have been proposed to relax the requirement of standard IVs by conditioning on a set of observed variables.
We develop an algorithm for unbiased causal effect estimation with a given ancestral IV and observational data.
arXiv Detail & Related papers (2022-01-11T07:02:16Z) - Offline Reinforcement Learning with Implicit Q-Learning [85.62618088890787]
Current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy.
We propose an offline RL method that never needs to evaluate actions outside of the dataset.
This method enables the learned policy to improve substantially over the best behavior in the data through generalization.
arXiv Detail & Related papers (2021-10-12T17:05:05Z) - Learning Deep Features in Instrumental Variable Regression [42.085253974990046]
In IV regression, learning proceeds in two stages: stage 1 performs linear regression from the instrument to the treatment; and stage 2 performs linear regression from the treatment to the outcome, conditioned on the instrument.
We propose a novel method, deep feature instrumental variable regression (DFIV), to address the case where relations between instruments, treatments, and outcomes may be nonlinear.
arXiv Detail & Related papers (2020-10-14T15:14:49Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.