Distributionally Robust Offline Reinforcement Learning with Linear
Function Approximation
- URL: http://arxiv.org/abs/2209.06620v1
- Date: Wed, 14 Sep 2022 13:17:59 GMT
- Title: Distributionally Robust Offline Reinforcement Learning with Linear
Function Approximation
- Authors: Xiaoteng Ma, Zhipeng Liang, Li Xia, Jiheng Zhang, Jose Blanchet,
Mingwen Liu, Qianchuan Zhao and Zhengyuan Zhou
- Abstract summary: We learn an RL agent with the historical data obtained from the source environment and optimize it to perform well in the perturbed one.
We prove our algorithm can achieve the suboptimality of $O(sqrtK)$ depending on the linear function dimension $d$.
- Score: 16.128778192359327
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Among the reasons that hinder the application of reinforcement learning (RL)
to real-world problems, two factors are critical: limited data and the mismatch
of the testing environment compared to training one. In this paper, we attempt
to address these issues simultaneously with the problem setup of
distributionally robust offline RL. Particularly, we learn an RL agent with the
historical data obtained from the source environment and optimize it to perform
well in the perturbed one. Moreover, we consider the linear function
approximation to apply the algorithm to large-scale problems. We prove our
algorithm can achieve the suboptimality of $O(1/\sqrt{K})$ depending on the
linear function dimension $d$, which seems to be the first result with sample
complexity guarantee in this setting. Diverse experiments are conducted to
demonstrate our theoretical findings, showing the superiority of our algorithm
against the non-robust one.
Related papers
- Simple Ingredients for Offline Reinforcement Learning [86.1988266277766]
offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task.
We show that existing methods struggle with diverse data: their performance considerably deteriorates as data collected for related but different tasks is simply added to the offline buffer.
We show that scale, more than algorithmic considerations, is the key factor influencing performance.
arXiv Detail & Related papers (2024-03-19T18:57:53Z) - On Sample-Efficient Offline Reinforcement Learning: Data Diversity,
Posterior Sampling, and Beyond [29.449446595110643]
We propose a notion of data diversity that subsumes the previous notions of coverage measures in offline RL.
Our proposed model-free PS-based algorithm for offline RL is novel, with sub-optimality bounds that are frequentist (i.e., worst-case) in nature.
arXiv Detail & Related papers (2024-01-06T20:52:04Z) - Neural Network Approximation for Pessimistic Offline Reinforcement
Learning [17.756108291816908]
We present a non-asymptotic estimation error of pessimistic offline RL using general neural network approximation.
Our result shows that the estimation error consists of two parts: the first converges to zero at a desired rate on the sample size with partially controllable concentrability, and the second becomes negligible if the residual constraint is tight.
arXiv Detail & Related papers (2023-12-19T05:17:27Z) - Bridging Distributionally Robust Learning and Offline RL: An Approach to
Mitigate Distribution Shift and Partial Data Coverage [32.578787778183546]
offline reinforcement learning (RL) algorithms learn optimal polices using historical (offline) data.
One of the main challenges in offline RL is the distribution shift.
We propose two offline RL algorithms using the distributionally robust learning (DRL) framework.
arXiv Detail & Related papers (2023-10-27T19:19:30Z) - Offline Reinforcement Learning with Differentiable Function
Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications.
We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA)
Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z) - Offline Reinforcement Learning with Realizability and Single-policy
Concentrability [40.15976281104956]
Sample-efficiency guarantees for offline reinforcement learning often rely on strong assumptions on both the function classes and the data coverage.
We analyze a simple algorithm based on primal-dual MDPs, where the dual variables are modeled using offline function against offline data.
arXiv Detail & Related papers (2022-02-09T18:51:24Z) - Offline Reinforcement Learning: Fundamental Barriers for Value Function
Approximation [74.3002974673248]
We consider the offline reinforcement learning problem, where the aim is to learn a decision making policy from logged data.
offline RL is becoming increasingly relevant in practice, because online data collection is well suited to safety-critical domains.
Our results show that sample-efficient offline reinforcement learning requires either restrictive coverage conditions or representation conditions that go beyond complexity learning.
arXiv Detail & Related papers (2021-11-21T23:22:37Z) - False Correlation Reduction for Offline Reinforcement Learning [115.11954432080749]
We propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm.
We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL)
arXiv Detail & Related papers (2021-10-24T15:34:03Z) - Combining Pessimism with Optimism for Robust and Efficient Model-Based
Deep Reinforcement Learning [56.17667147101263]
In real-world tasks, reinforcement learning agents encounter situations that are not present during training time.
To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations.
We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem.
arXiv Detail & Related papers (2021-03-18T16:50:17Z) - An Online Method for A Class of Distributionally Robust Optimization
with Non-Convex Objectives [54.29001037565384]
We propose a practical online method for solving a class of online distributionally robust optimization (DRO) problems.
Our studies demonstrate important applications in machine learning for improving the robustness of networks.
arXiv Detail & Related papers (2020-06-17T20:19:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.