Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced
Datasets
- URL: http://arxiv.org/abs/2310.04413v2
- Date: Thu, 12 Oct 2023 00:52:38 GMT
- Title: Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced
Datasets
- Authors: Zhang-Wei Hong, Aviral Kumar, Sathwik Karnik, Abhishek Bhandwaldar,
Akash Srivastava, Joni Pajarinen, Romain Laroche, Abhishek Gupta, Pulkit
Agrawal
- Abstract summary: offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data.
We argue that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset.
We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms.
- Score: 53.8218145723718
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Offline policy learning is aimed at learning decision-making policies using
existing datasets of trajectories without collecting additional data. The
primary motivation for using reinforcement learning (RL) instead of supervised
learning techniques such as behavior cloning is to find a policy that achieves
a higher average return than the trajectories constituting the dataset.
However, we empirically find that when a dataset is dominated by suboptimal
trajectories, state-of-the-art offline RL algorithms do not substantially
improve over the average return of trajectories in the dataset. We argue this
is due to an assumption made by current offline RL algorithms of staying close
to the trajectories in the dataset. If the dataset primarily consists of
sub-optimal trajectories, this assumption forces the policy to mimic the
suboptimal actions. We overcome this issue by proposing a sampling strategy
that enables the policy to only be constrained to ``good data" rather than all
actions in the dataset (i.e., uniform sampling). We present a realization of
the sampling strategy and an algorithm that can be used as a plug-and-play
module in standard offline RL algorithms. Our evaluation demonstrates
significant performance gains in 72 imbalanced datasets, D4RL dataset, and
across three different offline RL algorithms. Code is available at
https://github.com/Improbable-AI/dw-offline-rl.
Related papers
- Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory
Weighting [29.21380944341589]
We show that state-of-the-art offline RL algorithms are overly restrained by low-return trajectories and fail to exploit trajectories to the fullest.
This reweighted sampling strategy may be combined with any offline RL algorithm.
We empirically show that while CQL, IQL, and TD3+BC achieve only a part of this potential policy improvement, these same algorithms fully exploit the dataset.
arXiv Detail & Related papers (2023-06-22T17:58:02Z) - Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets.
We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged.
We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z) - Don't Change the Algorithm, Change the Data: Exploratory Data for
Offline Reinforcement Learning [147.61075994259807]
We propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL.
ExORL first generates data with unsupervised reward-free exploration, then relabels this data with a downstream reward before training a policy with offline RL.
We find that exploratory data allows vanilla off-policy RL algorithms, without any offline-specific modifications, to outperform or match state-of-the-art offline RL algorithms on downstream tasks.
arXiv Detail & Related papers (2022-01-31T18:39:27Z) - Understanding the Effects of Dataset Characteristics on Offline
Reinforcement Learning [4.819336169151637]
Offline Reinforcement Learning can learn policies from a given dataset without interacting with the environment.
We show how dataset characteristics influence the performance of Offline RL algorithms for discrete action environments.
For datasets with high TQ, Behavior Cloning outperforms or performs similarly to the best Offline RL algorithms.
arXiv Detail & Related papers (2021-11-08T18:48:43Z) - OptiDICE: Offline Policy Optimization via Stationary Distribution
Correction Estimation [59.469401906712555]
We present an offline reinforcement learning algorithm that prevents overestimation in a more principled way.
Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy.
We show that OptiDICE performs competitively with the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-21T00:43:30Z) - Representation Matters: Offline Pretraining for Sequential Decision
Making [27.74988221252854]
In this paper, we consider a slightly different approach to incorporating offline data into sequential decision-making.
We find that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms.
arXiv Detail & Related papers (2021-02-11T02:38:12Z) - POPO: Pessimistic Offline Policy Optimization [6.122342691982727]
We study why off-policy RL methods fail to learn in offline setting from the value function view.
We propose Pessimistic Offline Policy Optimization (POPO), which learns a pessimistic value function to get a strong policy.
We find that POPO performs surprisingly well and scales to tasks with high-dimensional state and action space.
arXiv Detail & Related papers (2020-12-26T06:24:34Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z) - D4RL: Datasets for Deep Data-Driven Reinforcement Learning [119.49182500071288]
We introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.
By moving beyond simple benchmark tasks and data collected by partially-trained RL agents, we reveal important and unappreciated deficiencies of existing algorithms.
arXiv Detail & Related papers (2020-04-15T17:18:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.