Challenges and Opportunities in Offline Reinforcement Learning from
Visual Observations
- URL: http://arxiv.org/abs/2206.04779v3
- Date: Thu, 6 Jul 2023 16:46:11 GMT
- Title: Challenges and Opportunities in Offline Reinforcement Learning from
Visual Observations
- Authors: Cong Lu, Philip J. Ball, Tim G. J. Rudner, Jack Parker-Holder, Michael
A. Osborne, Yee Whye Teh
- Abstract summary: offline reinforcement learning from visual observations with continuous action spaces remains under-explored.
We show that modifications to two popular vision-based online reinforcement learning algorithms suffice to outperform existing offline RL methods.
- Score: 58.758928936316785
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning has shown great promise in leveraging large
pre-collected datasets for policy learning, allowing agents to forgo
often-expensive online data collection. However, offline reinforcement learning
from visual observations with continuous action spaces remains under-explored,
with a limited understanding of the key challenges in this complex domain. In
this paper, we establish simple baselines for continuous control in the visual
domain and introduce a suite of benchmarking tasks for offline reinforcement
learning from visual observations designed to better represent the data
distributions present in real-world offline RL problems and guided by a set of
desiderata for offline RL from visual observations, including robustness to
visual distractions and visually identifiable changes in dynamics. Using this
suite of benchmarking tasks, we show that simple modifications to two popular
vision-based online reinforcement learning algorithms, DreamerV2 and DrQ-v2,
suffice to outperform existing offline RL methods and establish competitive
baselines for continuous control in the visual domain. We rigorously evaluate
these algorithms and perform an empirical evaluation of the differences between
state-of-the-art model-based and model-free offline RL methods for continuous
control from visual observations. All code and data used in this evaluation are
open-sourced to facilitate progress in this domain.
Related papers
- DMC-VB: A Benchmark for Representation Learning for Control with Visual Distractors [13.700885996266457]
Learning from previously collected data via behavioral cloning or offline reinforcement learning (RL) is a powerful recipe for scaling generalist agents.
We present theDeepMind Control Visual Benchmark (DMC-VB), a dataset collected in the DeepMind Control Suite to evaluate the robustness of offline RL agents.
Accompanying our dataset, we propose three benchmarks to evaluate representation learning methods for pretraining, and carry out experiments on several recently proposed methods.
arXiv Detail & Related papers (2024-09-26T23:07:01Z) - D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - Offline Reinforcement Learning from Datasets with Structured Non-Stationarity [50.35634234137108]
Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy.
We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode.
We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation.
arXiv Detail & Related papers (2024-05-23T02:41:36Z) - VIBR: Learning View-Invariant Value Functions for Robust Visual Control [3.2307366446033945]
VIBR (View-Invariant Bellman Residuals) is a method that combines multi-view training and invariant prediction to reduce out-of-distribution gap for RL based visuomotor control.
We show that VIBR outperforms existing methods on complex visuo-motor control environment with high visual perturbation.
arXiv Detail & Related papers (2023-06-14T14:37:34Z) - Representation Matters: Offline Pretraining for Sequential Decision
Making [27.74988221252854]
In this paper, we consider a slightly different approach to incorporating offline data into sequential decision-making.
We find that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms.
arXiv Detail & Related papers (2021-02-11T02:38:12Z) - Offline Reinforcement Learning from Images with Latent Space Models [60.69745540036375]
offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions.
We build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces.
Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP.
arXiv Detail & Related papers (2020-12-21T18:28:17Z) - OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement
Learning [107.6943868812716]
In many practical applications, the situation is reversed: an agent may have access to large amounts of undirected offline experience data, while access to the online environment is severely limited.
Our main insight is that, when presented with offline data composed of a variety of behaviors, an effective way to leverage this data is to extract a continuous space of recurring and temporally extended primitive behaviors.
In addition to benefiting offline policy optimization, we show that performing offline primitive learning in this way can also be leveraged for improving few-shot imitation learning.
arXiv Detail & Related papers (2020-10-26T14:31:08Z) - D4RL: Datasets for Deep Data-Driven Reinforcement Learning [119.49182500071288]
We introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.
By moving beyond simple benchmark tasks and data collected by partially-trained RL agents, we reveal important and unappreciated deficiencies of existing algorithms.
arXiv Detail & Related papers (2020-04-15T17:18:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.