Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous
Unobserved Confounders
- URL: http://arxiv.org/abs/2302.00662v2
- Date: Fri, 22 Sep 2023 15:15:07 GMT
- Title: Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous
Unobserved Confounders
- Authors: David Bruns-Smith and Angela Zhou
- Abstract summary: We study robust policy evaluation and policy optimization in the presence of sequentially-exogenous unobserved confounders.
We provide sample complexity bounds, insights, and show effectiveness both in simulations and on real-world longitudinal healthcare data of treating sepsis.
- Score: 16.193776814471768
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Offline reinforcement learning is important in domains such as medicine,
economics, and e-commerce where online experimentation is costly, dangerous or
unethical, and where the true model is unknown. However, most methods assume
all covariates used in the behavior policy's action decisions are observed.
Though this assumption, sequential ignorability/unconfoundedness, likely does
not hold in observational data, most of the data that accounts for selection
into treatment may be observed, motivating sensitivity analysis. We study
robust policy evaluation and policy optimization in the presence of
sequentially-exogenous unobserved confounders under a sensitivity model. We
propose and analyze orthogonalized robust fitted-Q-iteration that uses
closed-form solutions of the robust Bellman operator to derive a loss
minimization problem for the robust Q function, and adds a bias-correction to
quantile estimation. Our algorithm enjoys the computational ease of
fitted-Q-iteration and statistical improvements (reduced dependence on quantile
estimation error) from orthogonalization. We provide sample complexity bounds,
insights, and show effectiveness both in simulations and on real-world
longitudinal healthcare data of treating sepsis. In particular, our model of
sequential unobserved confounders yields an online Markov decision process,
rather than partially observed Markov decision process: we illustrate how this
can enable warm-starting optimistic reinforcement learning algorithms with
valid robust bounds from observational data.
Related papers
- Sample Complexity of Offline Distributionally Robust Linear Markov Decision Processes [37.15580574143281]
offline reinforcement learning (RL)
This paper considers the sample complexity of distributionally robust linear Markov decision processes (MDPs) with an uncertainty set characterized by the total variation distance using offline data.
We develop a pessimistic model-based algorithm and establish its sample complexity bound under minimal data coverage assumptions.
arXiv Detail & Related papers (2024-03-19T17:48:42Z) - Sensitivity-Aware Amortized Bayesian Inference [8.753065246797561]
Sensitivity analyses reveal the influence of various modeling choices on the outcomes of statistical analyses.
We propose sensitivity-aware amortized Bayesian inference (SA-ABI), a multifaceted approach to integrate sensitivity analyses into simulation-based inference with neural networks.
We demonstrate the effectiveness of our method in applied modeling problems, ranging from disease outbreak dynamics and global warming thresholds to human decision-making.
arXiv Detail & Related papers (2023-10-17T10:14:10Z) - Beta quantile regression for robust estimation of uncertainty in the
presence of outliers [1.6377726761463862]
Quantile Regression can be used to estimate aleatoric uncertainty in deep neural networks.
We propose a robust solution for quantile regression that incorporates concepts from robust divergence.
arXiv Detail & Related papers (2023-09-14T01:18:57Z) - Convergence of uncertainty estimates in Ensemble and Bayesian sparse
model discovery [4.446017969073817]
We show empirical success in terms of accuracy and robustness to noise with bootstrapping-based sequential thresholding least-squares estimator.
We show that this bootstrapping-based ensembling technique can perform a provably correct variable selection procedure with an exponential convergence rate of the error rate.
arXiv Detail & Related papers (2023-01-30T04:07:59Z) - Pessimistic Q-Learning for Offline Reinforcement Learning: Towards
Optimal Sample Complexity [51.476337785345436]
We study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes.
A variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity.
arXiv Detail & Related papers (2022-02-28T15:39:36Z) - Scalable Intervention Target Estimation in Linear Models [52.60799340056917]
Current approaches to causal structure learning either work with known intervention targets or use hypothesis testing to discover the unknown intervention targets.
This paper proposes a scalable and efficient algorithm that consistently identifies all intervention targets.
The proposed algorithm can be used to also update a given observational Markov equivalence class into the interventional Markov equivalence class.
arXiv Detail & Related papers (2021-11-15T03:16:56Z) - Generalization of Neural Combinatorial Solvers Through the Lens of
Adversarial Robustness [68.97830259849086]
Most datasets only capture a simpler subproblem and likely suffer from spurious features.
We study adversarial robustness - a local generalization property - to reveal hard, model-specific instances and spurious features.
Unlike in other applications, where perturbation models are designed around subjective notions of imperceptibility, our perturbation models are efficient and sound.
Surprisingly, with such perturbations, a sufficiently expressive neural solver does not suffer from the limitations of the accuracy-robustness trade-off common in supervised learning.
arXiv Detail & Related papers (2021-10-21T07:28:11Z) - Probabilistic robust linear quadratic regulators with Gaussian processes [73.0364959221845]
Probabilistic models such as Gaussian processes (GPs) are powerful tools to learn unknown dynamical systems from data for subsequent use in control design.
We present a novel controller synthesis for linearized GP dynamics that yields robust controllers with respect to a probabilistic stability margin.
arXiv Detail & Related papers (2021-05-17T08:36:18Z) - Learning Stable Nonparametric Dynamical Systems with Gaussian Process
Regression [9.126353101382607]
We learn a nonparametric Lyapunov function based on Gaussian process regression from data.
We prove that stabilization of the nominal model based on the nonparametric control Lyapunov function does not modify the behavior of the nominal model at training samples.
arXiv Detail & Related papers (2020-06-14T11:17:17Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z) - Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement
Learning [70.01650994156797]
Off- evaluation of sequential decision policies from observational data is necessary in batch reinforcement learning such as education healthcare.
We develop an approach that estimates the bounds of a given policy.
We prove convergence to the sharp bounds as we collect more confounded data.
arXiv Detail & Related papers (2020-02-11T16:18:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.