The Least Restriction for Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2107.01757v1
- Date: Mon, 5 Jul 2021 01:50:40 GMT
- Title: The Least Restriction for Offline Reinforcement Learning
- Authors: Zizhou Su
- Abstract summary: We propose a creative offline reinforcement learning framework, the Least Restriction (LR)
The LR regards selecting an action as taking a sample from the probability distribution.
It is able to learn robustly from different offline datasets, including random and suboptimal demonstrations.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many practical applications of reinforcement learning (RL) constrain the
agent to learn from a fixed offline dataset of logged interactions, which has
already been gathered, without offering further possibility for data
collection. However, commonly used off-policy RL algorithms, such as the Deep Q
Network and the Deep Deterministic Policy Gradient, are incapable of learning
without data correlated to the distribution under the current policy, making
them ineffective for this offline setting. As the first step towards useful
offline RL algorithms, we analysis the reason of instability in standard
off-policy RL algorithms. It is due to the bootstrapping error. The key to
avoiding this error, is ensuring that the agent's action space does not go out
of the fixed offline dataset. Based on our consideration, a creative offline RL
framework, the Least Restriction (LR), is proposed in this paper. The LR
regards selecting an action as taking a sample from the probability
distribution. It merely set a little limit for action selection, which not only
avoid the action being out of the offline dataset but also remove all the
unreasonable restrictions in earlier approaches (e.g. Batch-Constrained Deep
Q-Learning). In the further, we will demonstrate that the LR, is able to learn
robustly from different offline datasets, including random and suboptimal
demonstrations, on a range of practical control tasks.
Related papers
- Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced
Datasets [53.8218145723718]
offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data.
We argue that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset.
We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms.
arXiv Detail & Related papers (2023-10-06T17:58:14Z) - Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets.
We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged.
We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z) - Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in
Offline RL [28.563015766188478]
We introduce an offline reinforcement learning algorithm that explicitly clones a behavior policy to constrain value learning.
We show state-of-the-art performance on several datasets within the D4RL and Robomimic benchmarks.
arXiv Detail & Related papers (2022-06-01T18:04:43Z) - Don't Change the Algorithm, Change the Data: Exploratory Data for
Offline Reinforcement Learning [147.61075994259807]
We propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL.
ExORL first generates data with unsupervised reward-free exploration, then relabels this data with a downstream reward before training a policy with offline RL.
We find that exploratory data allows vanilla off-policy RL algorithms, without any offline-specific modifications, to outperform or match state-of-the-art offline RL algorithms on downstream tasks.
arXiv Detail & Related papers (2022-01-31T18:39:27Z) - OptiDICE: Offline Policy Optimization via Stationary Distribution
Correction Estimation [59.469401906712555]
We present an offline reinforcement learning algorithm that prevents overestimation in a more principled way.
Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy.
We show that OptiDICE performs competitively with the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-21T00:43:30Z) - Continuous Doubly Constrained Batch Reinforcement Learning [93.23842221189658]
We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment.
The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data.
We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates.
arXiv Detail & Related papers (2021-02-18T08:54:14Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.