Offline Reinforcement Learning with Realizability and Single-policy
Concentrability
- URL: http://arxiv.org/abs/2202.04634v2
- Date: Fri, 11 Feb 2022 17:35:04 GMT
- Title: Offline Reinforcement Learning with Realizability and Single-policy
Concentrability
- Authors: Wenhao Zhan, Baihe Huang, Audrey Huang, Nan Jiang, Jason D. Lee
- Abstract summary: Sample-efficiency guarantees for offline reinforcement learning often rely on strong assumptions on both the function classes and the data coverage.
We analyze a simple algorithm based on primal-dual MDPs, where the dual variables are modeled using offline function against offline data.
- Score: 40.15976281104956
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Sample-efficiency guarantees for offline reinforcement learning (RL) often
rely on strong assumptions on both the function classes (e.g.,
Bellman-completeness) and the data coverage (e.g., all-policy concentrability).
Despite the recent efforts on relaxing these assumptions, existing works are
only able to relax one of the two factors, leaving the strong assumption on the
other factor intact. As an important open problem, can we achieve
sample-efficient offline RL with weak assumptions on both factors?
In this paper we answer the question in the positive. We analyze a simple
algorithm based on the primal-dual formulation of MDPs, where the dual
variables (discounted occupancy) are modeled using a density-ratio function
against offline data. With proper regularization, we show that the algorithm
enjoys polynomial sample complexity, under only realizability and single-policy
concentrability. We also provide alternative analyses based on different
assumptions to shed light on the nature of primal-dual algorithms for offline
RL.
Related papers
- Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage [100.8180383245813]
We propose value-based algorithms for offline reinforcement learning (RL)
We show an analogous result for vanilla Q-functions under a soft margin condition.
Our algorithms' loss functions arise from casting the estimation problems as nonlinear convex optimization problems and Lagrangifying.
arXiv Detail & Related papers (2023-02-05T14:22:41Z) - Revisiting the Linear-Programming Framework for Offline RL with General
Function Approximation [24.577243536475233]
offline reinforcement learning (RL) concerns pursuing an optimal policy for sequential decision-making from a pre-collected dataset.
Recent theoretical progress has focused on developing sample-efficient offline RL algorithms with various relaxed assumptions on data coverage and function approximators.
We revisit the linear-programming framework for offline RL, and advance the existing results in several aspects.
arXiv Detail & Related papers (2022-12-28T15:28:12Z) - Distributionally Robust Offline Reinforcement Learning with Linear
Function Approximation [16.128778192359327]
We learn an RL agent with the historical data obtained from the source environment and optimize it to perform well in the perturbed one.
We prove our algorithm can achieve the suboptimality of $O(sqrtK)$ depending on the linear function dimension $d$.
arXiv Detail & Related papers (2022-09-14T13:17:59Z) - Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes [99.26864533035454]
We study offline reinforcement learning (RL) in partially observable Markov decision processes.
We propose the underlineProxy variable underlinePessimistic underlinePolicy underlineOptimization (textttP3O) algorithm.
textttP3O is the first provably efficient offline RL algorithm for POMDPs with a confounded dataset.
arXiv Detail & Related papers (2022-05-26T19:13:55Z) - Offline Reinforcement Learning Under Value and Density-Ratio
Realizability: the Power of Gaps [15.277483173402128]
We provide guarantees to a pessimistic algorithm based on a version space formed by marginalized importance sampling.
Our work is the first to identify the utility and the novel mechanism of gap assumptions in offline reinforcement learning.
arXiv Detail & Related papers (2022-03-25T23:33:38Z) - Offline Reinforcement Learning: Fundamental Barriers for Value Function
Approximation [74.3002974673248]
We consider the offline reinforcement learning problem, where the aim is to learn a decision making policy from logged data.
offline RL is becoming increasingly relevant in practice, because online data collection is well suited to safety-critical domains.
Our results show that sample-efficient offline reinforcement learning requires either restrictive coverage conditions or representation conditions that go beyond complexity learning.
arXiv Detail & Related papers (2021-11-21T23:22:37Z) - False Correlation Reduction for Offline Reinforcement Learning [115.11954432080749]
We propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm.
We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL)
arXiv Detail & Related papers (2021-10-24T15:34:03Z) - What are the Statistical Limits of Offline RL with Linear Function
Approximation? [70.33301077240763]
offline reinforcement learning seeks to utilize offline (observational) data to guide the learning of sequential decision making strategies.
This work focuses on the basic question of what are necessary representational and distributional conditions that permit provable sample-efficient offline reinforcement learning.
arXiv Detail & Related papers (2020-10-22T17:32:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.