Data-Efficient Pipeline for Offline Reinforcement Learning with Limited
Data
- URL: http://arxiv.org/abs/2210.08642v1
- Date: Sun, 16 Oct 2022 21:24:53 GMT
- Title: Data-Efficient Pipeline for Offline Reinforcement Learning with Limited
Data
- Authors: Allen Nie, Yannis Flet-Berliac, Deon R. Jordan, William Steenbergen,
Emma Brunskill
- Abstract summary: offline reinforcement learning can be used to improve future performance by leveraging historical data.
We introduce a task- and method-agnostic pipeline for automatically training, comparing, selecting, and deploying the best policy.
We show it can have substantial impacts when the dataset is small.
- Score: 28.846826115837825
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Offline reinforcement learning (RL) can be used to improve future performance
by leveraging historical data. There exist many different algorithms for
offline RL, and it is well recognized that these algorithms, and their
hyperparameter settings, can lead to decision policies with substantially
differing performance. This prompts the need for pipelines that allow
practitioners to systematically perform algorithm-hyperparameter selection for
their setting. Critically, in most real-world settings, this pipeline must only
involve the use of historical data. Inspired by statistical model selection
methods for supervised learning, we introduce a task- and method-agnostic
pipeline for automatically training, comparing, selecting, and deploying the
best policy when the provided dataset is limited in size. In particular, our
work highlights the importance of performing multiple data splits to produce
more reliable algorithm-hyperparameter selection. While this is a common
approach in supervised learning, to our knowledge, this has not been discussed
in detail in the offline RL setting. We show it can have substantial impacts
when the dataset is small. Compared to alternate approaches, our proposed
pipeline outputs higher-performing deployed policies from a broad range of
offline policy learning algorithms and across various simulation domains in
healthcare, education, and robotics. This work contributes toward the
development of a general-purpose meta-algorithm for automatic
algorithm-hyperparameter selection for offline RL.
Related papers
- D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced
Datasets [53.8218145723718]
offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data.
We argue that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset.
We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms.
arXiv Detail & Related papers (2023-10-06T17:58:14Z) - Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online.
We extensively ablate these design choices, demonstrating the key factors that most affect performance.
We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z) - Bridging the Gap Between Offline and Online Reinforcement Learning
Evaluation Methodologies [6.303272140868826]
Reinforcement learning (RL) has shown great promise with algorithms learning in environments with large state and action spaces.
Current deep RL algorithms require a tremendous amount of environment interactions for learning.
offline RL algorithms try to address this issue by bootstrapping the learning process from existing logged data.
arXiv Detail & Related papers (2022-12-15T20:36:10Z) - Launchpad: Learning to Schedule Using Offline and Online RL Methods [9.488752723308954]
Existing RL schedulers overlook the importance of learning from historical data and improving upon custom policies.
offline reinforcement learning presents the prospect of policy optimization from pre-recorded datasets without online environment interaction.
These methods address the challenges concerning the cost of data collection and safety, particularly pertinent to real-world applications of RL.
arXiv Detail & Related papers (2022-12-01T16:40:11Z) - When Should We Prefer Offline Reinforcement Learning Over Behavioral
Cloning? [86.43517734716606]
offline reinforcement learning (RL) algorithms can acquire effective policies by utilizing previously collected experience, without any online interaction.
behavioral cloning (BC) algorithms mimic a subset of the dataset via supervised learning.
We show that policies trained on sufficiently noisy suboptimal data can attain better performance than even BC algorithms with expert data.
arXiv Detail & Related papers (2022-04-12T08:25:34Z) - A Workflow for Offline Model-Free Robotic Reinforcement Learning [117.07743713715291]
offline reinforcement learning (RL) enables learning control policies by utilizing only prior experience, without any online interaction.
We develop a practical workflow for using offline RL analogous to the relatively well-understood for supervised learning problems.
We demonstrate the efficacy of this workflow in producing effective policies without any online tuning.
arXiv Detail & Related papers (2021-09-22T16:03:29Z) - OptiDICE: Offline Policy Optimization via Stationary Distribution
Correction Estimation [59.469401906712555]
We present an offline reinforcement learning algorithm that prevents overestimation in a more principled way.
Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy.
We show that OptiDICE performs competitively with the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-21T00:43:30Z) - Representation Matters: Offline Pretraining for Sequential Decision
Making [27.74988221252854]
In this paper, we consider a slightly different approach to incorporating offline data into sequential decision-making.
We find that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms.
arXiv Detail & Related papers (2021-02-11T02:38:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.