Off-policy Evaluation in Doubly Inhomogeneous Environments
- URL: http://arxiv.org/abs/2306.08719v3
- Date: Fri, 8 Sep 2023 02:06:57 GMT
- Title: Off-policy Evaluation in Doubly Inhomogeneous Environments
- Authors: Zeyu Bian, Chengchun Shi, Zhengling Qi and Lan Wang
- Abstract summary: We develop a general OPE framework that consists of both model-based and model-free approaches.
This is the first paper that develops statistically sound OPE methods in offline RL with double inhomogeneities.
- Score: 29.434386775600498
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work aims to study off-policy evaluation (OPE) under scenarios where two
key reinforcement learning (RL) assumptions -- temporal stationarity and
individual homogeneity are both violated. To handle the ``double
inhomogeneities", we propose a class of latent factor models for the reward and
observation transition functions, under which we develop a general OPE
framework that consists of both model-based and model-free approaches. To our
knowledge, this is the first paper that develops statistically sound OPE
methods in offline RL with double inhomogeneities. It contributes to a deeper
understanding of OPE in environments, where standard RL assumptions are not
met, and provides several practical approaches in these settings. We establish
the theoretical properties of the proposed value estimators and empirically
show that our approach outperforms competing methods that ignore either
temporal nonstationarity or individual heterogeneity. Finally, we illustrate
our method on a data set from the Medical Information Mart for Intensive Care.
Related papers
- SeMOPO: Learning High-quality Model and Policy from Low-quality Offline Visual Datasets [32.496818080222646]
We propose a new approach to model-based offline reinforcement learning.
We provide a theoretical guarantee of model uncertainty and performance bound of SeMOPO.
Experimental results show that our method substantially outperforms all baseline methods.
arXiv Detail & Related papers (2024-06-13T15:16:38Z) - Composite Survival Analysis: Learning with Auxiliary Aggregated
Baselines and Survival Scores [0.0]
Survival Analysis (SA) constitutes the default method for time-to-event modeling.
We show how to improve the training and inference of SA models by decoupling their full expression into (1) an aggregated baseline hazard, which captures the overall behavior of a given population, and (2) independently distributed survival scores, which model idiosyncratic probabilistic dynamics of its given members, in a fully parametric setting.
arXiv Detail & Related papers (2023-12-10T11:13:22Z) - Variance-Preserving-Based Interpolation Diffusion Models for Speech
Enhancement [53.2171981279647]
We present a framework that encapsulates both the VP- and variance-exploding (VE)-based diffusion methods.
To improve performance and ease model training, we analyze the common difficulties encountered in diffusion models.
We evaluate our model against several methods using a public benchmark to showcase the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-14T14:22:22Z) - Distributionally Robust Causal Inference with Observational Data [4.8986598953553555]
We consider the estimation of average treatment effects in observational studies without the standard assumption of unconfoundedness.
We propose a new framework of robust causal inference under the general observational study setting with the possible existence of unobserved confounders.
arXiv Detail & Related papers (2022-10-15T16:02:33Z) - Offline Reinforcement Learning with Instrumental Variables in Confounded
Markov Decision Processes [93.61202366677526]
We study the offline reinforcement learning (RL) in the face of unmeasured confounders.
We propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy.
arXiv Detail & Related papers (2022-09-18T22:03:55Z) - Reinforcement Learning with Heterogeneous Data: Estimation and Inference [84.72174994749305]
We introduce the K-Heterogeneous Markov Decision Process (K-Hetero MDP) to address sequential decision problems with population heterogeneity.
We propose the Auto-Clustered Policy Evaluation (ACPE) for estimating the value of a given policy, and the Auto-Clustered Policy Iteration (ACPI) for estimating the optimal policy in a given policy class.
We present simulations to support our theoretical findings, and we conduct an empirical study on the standard MIMIC-III dataset.
arXiv Detail & Related papers (2022-01-31T20:58:47Z) - Regularizing Variational Autoencoder with Diversity and Uncertainty
Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference.
We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z) - Off-Policy Imitation Learning from Observations [78.30794935265425]
Learning from Observations (LfO) is a practical reinforcement learning scenario from which many applications can benefit.
We propose a sample-efficient LfO approach that enables off-policy optimization in a principled manner.
Our approach is comparable with state-of-the-art locomotion in terms of both sample-efficiency and performance.
arXiv Detail & Related papers (2021-02-25T21:33:47Z) - Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with
Latent Confounders [62.54431888432302]
We study an OPE problem in an infinite-horizon, ergodic Markov decision process with unobserved confounders.
We show how, given only a latent variable model for states and actions, policy value can be identified from off-policy data.
arXiv Detail & Related papers (2020-07-27T22:19:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.