Bridging the Gap Between Offline and Online Reinforcement Learning
Evaluation Methodologies
- URL: http://arxiv.org/abs/2212.08131v2
- Date: Wed, 22 Nov 2023 02:35:37 GMT
- Title: Bridging the Gap Between Offline and Online Reinforcement Learning
Evaluation Methodologies
- Authors: Shivakanth Sujit, Pedro H. M. Braga, Jorg Bornschein, Samira Ebrahimi
Kahou
- Abstract summary: Reinforcement learning (RL) has shown great promise with algorithms learning in environments with large state and action spaces.
Current deep RL algorithms require a tremendous amount of environment interactions for learning.
offline RL algorithms try to address this issue by bootstrapping the learning process from existing logged data.
- Score: 6.303272140868826
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning (RL) has shown great promise with algorithms learning
in environments with large state and action spaces purely from scalar reward
signals. A crucial challenge for current deep RL algorithms is that they
require a tremendous amount of environment interactions for learning. This can
be infeasible in situations where such interactions are expensive; such as in
robotics. Offline RL algorithms try to address this issue by bootstrapping the
learning process from existing logged data without needing to interact with the
environment from the very beginning. While online RL algorithms are typically
evaluated as a function of the number of environment interactions, there exists
no single established protocol for evaluating offline RL methods.In this paper,
we propose a sequential approach to evaluate offline RL algorithms as a
function of the training set size and thus by their data efficiency. Sequential
evaluation provides valuable insights into the data efficiency of the learning
process and the robustness of algorithms to distribution changes in the dataset
while also harmonizing the visualization of the offline and online learning
phases. Our approach is generally applicable and easy to implement. We compare
several existing offline RL algorithms using this approach and present insights
from a variety of tasks and offline datasets.
Related papers
- D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - Understanding the performance gap between online and offline alignment algorithms [63.137832242488926]
We show that offline algorithms train policy to become good at pairwise classification, while online algorithms are good at generations.
This hints at a unique interplay between discriminative and generative capabilities, which is greatly impacted by the sampling process.
Our study sheds light on the pivotal role of on-policy sampling in AI alignment, and hints at certain fundamental challenges of offline alignment algorithms.
arXiv Detail & Related papers (2024-05-14T09:12:30Z) - Data-Efficient Pipeline for Offline Reinforcement Learning with Limited
Data [28.846826115837825]
offline reinforcement learning can be used to improve future performance by leveraging historical data.
We introduce a task- and method-agnostic pipeline for automatically training, comparing, selecting, and deploying the best policy.
We show it can have substantial impacts when the dataset is small.
arXiv Detail & Related papers (2022-10-16T21:24:53Z) - A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open
Problems [0.0]
Reinforcement learning (RL) has experienced a dramatic increase in popularity.
There is still a wide range of domains inaccessible to RL due to the high cost and danger of interacting with the environment.
offline RL is a paradigm that learns exclusively from static datasets of previously collected interactions.
arXiv Detail & Related papers (2022-03-02T20:05:11Z) - A Workflow for Offline Model-Free Robotic Reinforcement Learning [117.07743713715291]
offline reinforcement learning (RL) enables learning control policies by utilizing only prior experience, without any online interaction.
We develop a practical workflow for using offline RL analogous to the relatively well-understood for supervised learning problems.
We demonstrate the efficacy of this workflow in producing effective policies without any online tuning.
arXiv Detail & Related papers (2021-09-22T16:03:29Z) - Behavioral Priors and Dynamics Models: Improving Performance and Domain
Transfer in Offline RL [82.93243616342275]
We introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE)
MABE is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary.
In experiments that require cross-domain generalization, we find that MABE outperforms prior methods.
arXiv Detail & Related papers (2021-06-16T20:48:49Z) - FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance
Metric Learning and Behavior Regularization [10.243908145832394]
We study the offline meta-reinforcement learning (OMRL) problem, a paradigm which enables reinforcement learning (RL) algorithms to quickly adapt to unseen tasks.
This problem is still not fully understood, for which two major challenges need to be addressed.
We provide analysis and insight showing that some simple design choices can yield substantial improvements over recent approaches.
arXiv Detail & Related papers (2020-10-02T17:13:39Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z) - D4RL: Datasets for Deep Data-Driven Reinforcement Learning [119.49182500071288]
We introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.
By moving beyond simple benchmark tasks and data collected by partially-trained RL agents, we reveal important and unappreciated deficiencies of existing algorithms.
arXiv Detail & Related papers (2020-04-15T17:18:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.