S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement
Learning
- URL: http://arxiv.org/abs/2103.06326v1
- Date: Wed, 10 Mar 2021 20:13:21 GMT
- Title: S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement
Learning
- Authors: Samarth Sinha, Animesh Garg
- Abstract summary: offline reinforcement learning proposes to learn policies from large collected datasets without interaction.
Current algorithms overfit to the dataset they are trained on and perform poor out-of-distribution generalization to the environment when deployed.
We propose a Surprisingly Simple Self-Supervision algorithm (S4RL) which utilizes data augmentations from states to learn value functions that are better at generalizing and extrapolating when deployed in the environment.
- Score: 28.947071041811586
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning proposes to learn policies from large
collected datasets without interaction. These algorithms have made it possible
to learn useful skills from data that can then be transferred to the
environment, making it feasible to deploy the trained policies in real-world
settings where interactions may be costly or dangerous, such as self-driving.
However, current algorithms overfit to the dataset they are trained on and
perform poor out-of-distribution (OOD) generalization to the environment when
deployed. We propose a Surprisingly Simple Self-Supervision algorithm (S4RL),
which utilizes data augmentations from states to learn value functions that are
better at generalizing and extrapolating when deployed in the environment. We
investigate different data augmentation techniques that help learning a value
function that can extrapolate to OOD data, and how to combine data
augmentations and offline RL algorithms to learn a policy. We experimentally
show that using S4RL significantly improves the state-of-the-art on most
benchmark offline reinforcement learning tasks on popular benchmark datasets
from D4RL, despite being simple and easy to implement.
Related papers
- Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies.
Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits.
We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z) - D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - AD4RL: Autonomous Driving Benchmarks for Offline Reinforcement Learning with Value-based Dataset [2.66269503676104]
This paper provides autonomous driving datasets and benchmarks for offline reinforcement learning research.
We provide 19 datasets, including real-world human driver's datasets, and seven popular offline reinforcement learning algorithms in three realistic driving scenarios.
arXiv Detail & Related papers (2024-04-03T03:36:35Z) - Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced
Datasets [53.8218145723718]
offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data.
We argue that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset.
We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms.
arXiv Detail & Related papers (2023-10-06T17:58:14Z) - Causal Decision Transformer for Recommender Systems via Offline
Reinforcement Learning [23.638418776700522]
We propose a new model named the causal decision transformer for recommender systems (CDT4Rec)
CDT4Rec is an offline reinforcement learning system that can learn from a dataset rather than from online interaction.
To demonstrate the feasibility and superiority of our model, we have conducted experiments on six real-world offline datasets and one online simulator.
arXiv Detail & Related papers (2023-04-17T00:05:52Z) - Real World Offline Reinforcement Learning with Realistic Data Source [33.7474988142367]
offline reinforcement learning (ORL) holds great promise for robot learning due to its ability to learn from arbitrary pre-generated experience.
Current ORL benchmarks are almost entirely in simulation and utilize contrived datasets like replay buffers of online RL agents or sub-optimal trajectories.
In this work (Real-ORL), we posit that data collected from safe operations of closely related tasks are more practical data sources for real-world robot learning.
arXiv Detail & Related papers (2022-10-12T17:57:05Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Understanding the Effects of Dataset Characteristics on Offline
Reinforcement Learning [4.819336169151637]
Offline Reinforcement Learning can learn policies from a given dataset without interacting with the environment.
We show how dataset characteristics influence the performance of Offline RL algorithms for discrete action environments.
For datasets with high TQ, Behavior Cloning outperforms or performs similarly to the best Offline RL algorithms.
arXiv Detail & Related papers (2021-11-08T18:48:43Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.