Look Beneath the Surface: Exploiting Fundamental Symmetry for
Sample-Efficient Offline RL
- URL: http://arxiv.org/abs/2306.04220v6
- Date: Mon, 13 Nov 2023 15:12:45 GMT
- Title: Look Beneath the Surface: Exploiting Fundamental Symmetry for
Sample-Efficient Offline RL
- Authors: Peng Cheng, Xianyuan Zhan, Zhihao Wu, Wenjia Zhang, Shoucheng Song,
Han Wang, Youfang Lin, Li Jiang
- Abstract summary: offline reinforcement learning (RL) offers an appealing approach to real-world tasks by learning policies from pre-collected datasets.
However, the performance of existing offline RL algorithms heavily depends on the scale and state-action space coverage of datasets.
We provide a new insight that leveraging the fundamental symmetry of system dynamics can substantially enhance offline RL performance under small datasets.
- Score: 29.885978495034703
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning (RL) offers an appealing approach to
real-world tasks by learning policies from pre-collected datasets without
interacting with the environment. However, the performance of existing offline
RL algorithms heavily depends on the scale and state-action space coverage of
datasets. Real-world data collection is often expensive and uncontrollable,
leading to small and narrowly covered datasets and posing significant
challenges for practical deployments of offline RL. In this paper, we provide a
new insight that leveraging the fundamental symmetry of system dynamics can
substantially enhance offline RL performance under small datasets.
Specifically, we propose a Time-reversal symmetry (T-symmetry) enforced
Dynamics Model (TDM), which establishes consistency between a pair of forward
and reverse latent dynamics. TDM provides both well-behaved representations for
small datasets and a new reliability measure for OOD samples based on
compliance with the T-symmetry. These can be readily used to construct a new
offline RL algorithm (TSRL) with less conservative policy constraints and a
reliable latent space data augmentation procedure. Based on extensive
experiments, we find TSRL achieves great performance on small benchmark
datasets with as few as 1% of the original samples, which significantly
outperforms the recent offline RL algorithms in terms of data efficiency and
generalizability.Code is available at: https://github.com/pcheng2/TSRL
Related papers
- Enhancing Sample Efficiency and Exploration in Reinforcement Learning through the Integration of Diffusion Models and Proximal Policy Optimization [1.631115063641726]
We propose a framework that enhances PPO algorithms by incorporating a diffusion model to generate high-quality virtual trajectories for offline datasets.
Our contributions are threefold: we explore the potential of diffusion models in RL, particularly for offline datasets, extend the application of online RL to offline environments, and experimentally validate the performance improvements of PPO with diffusion models.
arXiv Detail & Related papers (2024-09-02T19:10:32Z) - D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - Bridging Distributionally Robust Learning and Offline RL: An Approach to
Mitigate Distribution Shift and Partial Data Coverage [32.578787778183546]
offline reinforcement learning (RL) algorithms learn optimal polices using historical (offline) data.
One of the main challenges in offline RL is the distribution shift.
We propose two offline RL algorithms using the distributionally robust learning (DRL) framework.
arXiv Detail & Related papers (2023-10-27T19:19:30Z) - Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced
Datasets [53.8218145723718]
offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data.
We argue that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset.
We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms.
arXiv Detail & Related papers (2023-10-06T17:58:14Z) - Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid
Reinforcement Learning [66.43003402281659]
A central question boils down to how to efficiently utilize online data collection to strengthen and complement the offline dataset.
We design a three-stage hybrid RL algorithm that beats the best of both worlds -- pure offline RL and pure online RL.
The proposed algorithm does not require any reward information during data collection.
arXiv Detail & Related papers (2023-05-17T15:17:23Z) - Instabilities of Offline RL with Pre-Trained Neural Representation [127.89397629569808]
In offline reinforcement learning (RL), we seek to utilize offline data to evaluate (or learn) policies in scenarios where the data are collected from a distribution that substantially differs from that of the target policy to be evaluated.
Recent theoretical advances have shown that such sample-efficient offline RL is indeed possible provided certain strong representational conditions hold.
This work studies these issues from an empirical perspective to gauge how stable offline RL methods are.
arXiv Detail & Related papers (2021-03-08T18:06:44Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z) - D4RL: Datasets for Deep Data-Driven Reinforcement Learning [119.49182500071288]
We introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.
By moving beyond simple benchmark tasks and data collected by partially-trained RL agents, we reveal important and unappreciated deficiencies of existing algorithms.
arXiv Detail & Related papers (2020-04-15T17:18:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.