Related papers: Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning

Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning

URL: http://arxiv.org/abs/2310.17139v1
Date: Thu, 26 Oct 2023 04:20:55 GMT
Title: Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning
Authors: Hongyu Zang, Xin Li, Leiji Zhang, Yang Liu, Baigui Sun, Riashat Islam, Remi Tachet des Combes, Romain Laroche
Abstract summary: We aim to understand why bisimulation methods succeed in online settings, but falter in offline tasks. Our analysis reveals that missing transitions in the dataset are particularly harmful to the bisimulation principle. We implement these recommendations on two state-of-the-art bisimulation-based algorithms, MICo and SimSR.
Score: 34.66035026036424
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While bisimulation-based approaches hold promise for learning robust state representations for Reinforcement Learning (RL) tasks, their efficacy in offline RL tasks has not been up to par. In some instances, their performance has even significantly underperformed alternative methods. We aim to understand why bisimulation methods succeed in online settings, but falter in offline tasks. Our analysis reveals that missing transitions in the dataset are particularly harmful to the bisimulation principle, leading to ineffective estimation. We also shed light on the critical role of reward scaling in bounding the scale of bisimulation measurements and of the value error they induce. Based on these findings, we propose to apply the expectile operator for representation learning to our offline RL setting, which helps to prevent overfitting to incomplete data. Meanwhile, by introducing an appropriate reward scaling strategy, we avoid the risk of feature collapse in representation space. We implement these recommendations on two state-of-the-art bisimulation-based algorithms, MICo and SimSR, and demonstrate performance gains on two benchmark suites: D4RL and Visual D4RL. Codes are provided at \url{https://github.com/zanghyu/Offline_Bisimulation}.

Related papers

Align Your Intents: Offline Imitation Learning via Optimal Transport [3.1728695158666396]
We show that an imitating agent can still learn the desired behavior merely from observing the expert. In our method, AILOT, we involve special representation of states in a form of intents that incorporate pairwise spatial distances within the data. We report that AILOT outperforms state-of-the art offline imitation learning algorithms on D4RL benchmarks.
arXiv Detail & Related papers (2024-02-20T14:24:00Z)
Reasoning with Latent Diffusion in Offline Reinforcement Learning [11.349356866928547]
offline reinforcement learning holds promise as a means to learn high-reward policies from a static dataset. Key challenge in offline RL lies in effectively stitching portions of suboptimal trajectories from the static dataset. We propose a novel approach that leverages the expressiveness of latent diffusion to model in-support trajectory sequences as compressed latent skills.
arXiv Detail & Related papers (2023-09-12T20:58:21Z)
Dual Generator Offline Reinforcement Learning [90.05278061564198]
In offline RL, constraining the learned policy to remain close to the data is essential. In practice, GAN-based offline RL methods have not performed as well as alternative approaches. We show that not only does having two generators enable an effective GAN-based offline RL method, but also approximates a support constraint.
arXiv Detail & Related papers (2022-11-02T20:25:18Z)
Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets. We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged. We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z)
Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism [65.46524775457928]
offline reinforcement learning seeks to utilize offline/historical data to optimize sequential decision-making strategies. We study the statistical limits of offline reinforcement learning with linear model representations.
arXiv Detail & Related papers (2022-03-11T09:00:12Z)
Offline Reinforcement Learning with Value-based Episodic Memory [19.12430651038357]
offline reinforcement learning (RL) shows promise of applying RL to real-world problems. We propose Expectile V-Learning (EVL), which smoothly interpolates between the optimal value learning and behavior cloning. We present a new offline method called Value-based Episodic Memory (VEM)
arXiv Detail & Related papers (2021-10-19T08:20:11Z)
Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning [63.53407136812255]
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. Existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly.
arXiv Detail & Related papers (2021-05-17T20:16:46Z)
Instabilities of Offline RL with Pre-Trained Neural Representation [127.89397629569808]
In offline reinforcement learning (RL), we seek to utilize offline data to evaluate (or learn) policies in scenarios where the data are collected from a distribution that substantially differs from that of the target policy to be evaluated. Recent theoretical advances have shown that such sample-efficient offline RL is indeed possible provided certain strong representational conditions hold. This work studies these issues from an empirical perspective to gauge how stable offline RL methods are.
arXiv Detail & Related papers (2021-03-08T18:06:44Z)
Continuous Doubly Constrained Batch Reinforcement Learning [93.23842221189658]
We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment. The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data. We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates.
arXiv Detail & Related papers (2021-02-18T08:54:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.