On the Role of Discount Factor in Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2206.03383v1
- Date: Tue, 7 Jun 2022 15:22:42 GMT
- Title: On the Role of Discount Factor in Offline Reinforcement Learning
- Authors: Hao Hu, Yiqin Yang, Qianchuan Zhao, Chongjie Zhang
- Abstract summary: The discount factor, $gamma$, plays a vital role in improving online RL sample efficiency and estimation accuracy.
This paper examines two distinct effects of $gamma$ in offline RL with theoretical analysis.
The results show that the discount factor plays an essential role in the performance of offline RL algorithms.
- Score: 25.647624787936028
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning (RL) enables effective learning from
previously collected data without exploration, which shows great promise in
real-world applications when exploration is expensive or even infeasible. The
discount factor, $\gamma$, plays a vital role in improving online RL sample
efficiency and estimation accuracy, but the role of the discount factor in
offline RL is not well explored. This paper examines two distinct effects of
$\gamma$ in offline RL with theoretical analysis, namely the regularization
effect and the pessimism effect. On the one hand, $\gamma$ is a regulator to
trade-off optimality with sample efficiency upon existing offline techniques.
On the other hand, lower guidance $\gamma$ can also be seen as a way of
pessimism where we optimize the policy's performance in the worst possible
models. We empirically verify the above theoretical observation with tabular
MDPs and standard D4RL tasks. The results show that the discount factor plays
an essential role in the performance of offline RL algorithms, both under small
data regimes upon existing offline methods and in large data regimes without
other conservatisms.
Related papers
- D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - Is Value Learning Really the Main Bottleneck in Offline RL? [70.54708989409409]
We show that the choice of a policy extraction algorithm significantly affects the performance and scalability of offline RL.
We propose two simple test-time policy improvement methods and show that these methods lead to better performance.
arXiv Detail & Related papers (2024-06-13T17:07:49Z) - UDQL: Bridging The Gap between MSE Loss and The Optimal Value Function in Offline Reinforcement Learning [10.593924216046977]
We first theoretically analyze overestimation phenomenon led by MSE and provide the theoretical upper bound of the overestimated error.
At last, we propose the offline RL algorithm based on underestimated operator and diffusion policy model.
arXiv Detail & Related papers (2024-06-05T14:37:42Z) - CROP: Conservative Reward for Model-based Offline Policy Optimization [15.121328040092264]
This paper proposes a novel model-based offline RL algorithm, Conservative Reward for model-based Offline Policy optimization (CROP)
To achieve a conservative reward estimation, CROP simultaneously minimizes the estimation error and the reward of random actions.
Notably, CROP establishes an innovative connection between offline and online RL, highlighting that offline RL problems can be tackled by adopting online RL techniques.
arXiv Detail & Related papers (2023-10-26T08:45:23Z) - The Provable Benefits of Unsupervised Data Sharing for Offline
Reinforcement Learning [25.647624787936028]
We propose a novel, Provable Data Sharing algorithm (PDS) to utilize reward-free data for offline reinforcement learning.
PDS significantly improves the performance of offline RL algorithms with reward-free data.
arXiv Detail & Related papers (2023-02-27T03:35:02Z) - RORL: Robust Offline Reinforcement Learning via Conservative Smoothing [72.8062448549897]
offline reinforcement learning can exploit the massive amount of offline data for complex decision-making tasks.
Current offline RL algorithms are generally designed to be conservative for value estimation and action selection.
We propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique.
arXiv Detail & Related papers (2022-06-06T18:07:41Z) - Pessimistic Model Selection for Offline Deep Reinforcement Learning [56.282483586473816]
Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications.
One main barrier is the over-fitting issue that leads to poor generalizability of the policy learned by DRL.
We propose a pessimistic model selection (PMS) approach for offline DRL with a theoretical guarantee.
arXiv Detail & Related papers (2021-11-29T06:29:49Z) - Instabilities of Offline RL with Pre-Trained Neural Representation [127.89397629569808]
In offline reinforcement learning (RL), we seek to utilize offline data to evaluate (or learn) policies in scenarios where the data are collected from a distribution that substantially differs from that of the target policy to be evaluated.
Recent theoretical advances have shown that such sample-efficient offline RL is indeed possible provided certain strong representational conditions hold.
This work studies these issues from an empirical perspective to gauge how stable offline RL methods are.
arXiv Detail & Related papers (2021-03-08T18:06:44Z) - MOReL : Model-Based Offline Reinforcement Learning [49.30091375141527]
In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment.
We present MOReL, an algorithmic framework for model-based offline RL.
We show that MOReL matches or exceeds state-of-the-art results in widely studied offline RL benchmarks.
arXiv Detail & Related papers (2020-05-12T17:52:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.