Distributionally Robust Model-Based Offline Reinforcement Learning with
Near-Optimal Sample Complexity
- URL: http://arxiv.org/abs/2208.05767v4
- Date: Thu, 28 Dec 2023 20:16:50 GMT
- Title: Distributionally Robust Model-Based Offline Reinforcement Learning with
Near-Optimal Sample Complexity
- Authors: Laixi Shi and Yuejie Chi
- Abstract summary: offline reinforcement learning aims to learn to perform decision making from history data without active exploration.
Due to uncertainties and variabilities of the environment, it is critical to learn a robust policy that performs well even when the deployed environment deviates from the nominal one used to collect the history dataset.
We consider a distributionally robust formulation of offline RL, focusing on robust Markov decision processes with an uncertainty set specified by the Kullback-Leibler divergence in both finite-horizon and infinite-horizon settings.
- Score: 39.886149789339335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper concerns the central issues of model robustness and sample
efficiency in offline reinforcement learning (RL), which aims to learn to
perform decision making from history data without active exploration. Due to
uncertainties and variabilities of the environment, it is critical to learn a
robust policy -- with as few samples as possible -- that performs well even
when the deployed environment deviates from the nominal one used to collect the
history dataset. We consider a distributionally robust formulation of offline
RL, focusing on tabular robust Markov decision processes with an uncertainty
set specified by the Kullback-Leibler divergence in both finite-horizon and
infinite-horizon settings. To combat with sample scarcity, a model-based
algorithm that combines distributionally robust value iteration with the
principle of pessimism in the face of uncertainty is proposed, by penalizing
the robust value estimates with a carefully designed data-driven penalty term.
Under a mild and tailored assumption of the history dataset that measures
distribution shift without requiring full coverage of the state-action space,
we establish the finite-sample complexity of the proposed algorithms. We
further develop an information-theoretic lower bound, which suggests that
learning RMDPs is at least as hard as the standard MDPs when the uncertainty
level is sufficient small, and corroborates the tightness of our upper bound up
to polynomial factors of the (effective) horizon length for a range of
uncertainty levels. To the best our knowledge, this provides the first provably
near-optimal robust offline RL algorithm that learns under model uncertainty
and partial coverage.
Related papers
- Robust Offline Reinforcement Learning for Non-Markovian Decision Processes [48.9399496805422]
We study the learning problem of robust offline non-Markovian RL.
We introduce a novel dataset distillation and a lower confidence bound (LCB) design for robust values under different types of the uncertainty set.
By further introducing a novel type-I concentrability coefficient tailored for offline low-rank non-Markovian decision processes, we prove that our algorithm can find an $epsilon$-optimal robust policy.
arXiv Detail & Related papers (2024-11-12T03:22:56Z) - Distributionally robust risk evaluation with an isotonic constraint [20.74502777102024]
Distributionally robust learning aims to control the worst-case statistical performance within an uncertainty set of candidate distributions.
We propose a shape-constrained approach to DRL, which incorporates prior information about the way in which the unknown target distribution differs from its estimate.
Empirical studies on both synthetic and real data examples demonstrate the improved accuracy of the proposed shape-constrained approach.
arXiv Detail & Related papers (2024-07-09T13:56:34Z) - Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm [14.517103323409307]
Sim-to-real gap represents disparity between training and testing environments.
A promising approach to addressing this challenge is distributionally robust RL.
We tackle robust RL via interactive data collection and present an algorithm with a provable sample complexity guarantee.
arXiv Detail & Related papers (2024-04-04T16:40:22Z) - Sample Complexity of Offline Distributionally Robust Linear Markov Decision Processes [37.15580574143281]
offline reinforcement learning (RL)
This paper considers the sample complexity of distributionally robust linear Markov decision processes (MDPs) with an uncertainty set characterized by the total variation distance using offline data.
We develop a pessimistic model-based algorithm and establish its sample complexity bound under minimal data coverage assumptions.
arXiv Detail & Related papers (2024-03-19T17:48:42Z) - The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model [61.87673435273466]
This paper investigates model robustness in reinforcement learning (RL) to reduce the sim-to-real gap in practice.
We adopt the framework of distributionally robust Markov decision processes (RMDPs), aimed at learning a policy that optimize the worst-case performance when the deployed environment falls within a prescribed uncertainty set around the nominal MDP.
arXiv Detail & Related papers (2023-05-26T02:32:03Z) - Offline Reinforcement Learning with Instrumental Variables in Confounded
Markov Decision Processes [93.61202366677526]
We study the offline reinforcement learning (RL) in the face of unmeasured confounders.
We propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy.
arXiv Detail & Related papers (2022-09-18T22:03:55Z) - Pessimistic Q-Learning for Offline Reinforcement Learning: Towards
Optimal Sample Complexity [51.476337785345436]
We study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes.
A variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity.
arXiv Detail & Related papers (2022-02-28T15:39:36Z) - Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement
Learning [70.01650994156797]
Off- evaluation of sequential decision policies from observational data is necessary in batch reinforcement learning such as education healthcare.
We develop an approach that estimates the bounds of a given policy.
We prove convergence to the sharp bounds as we collect more confounded data.
arXiv Detail & Related papers (2020-02-11T16:18:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.