Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning
with Energy-based Models
- URL: http://arxiv.org/abs/2305.11340v1
- Date: Thu, 18 May 2023 23:23:08 GMT
- Title: Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning
with Energy-based Models
- Authors: Wenhao Ding, Tong Che, Ding Zhao, Marco Pavone
- Abstract summary: We show that current reward-conditioned reinforcement learning approaches are fundamentally limited.
We propose a novel set of inductive biases for RCRL inspired by Bayes' theorem.
We show that BR-RCRL achieves state-of-the-art performance on the Gym-Mujoco and Atari offline RL benchmarks.
- Score: 46.24690220893344
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, reward-conditioned reinforcement learning (RCRL) has gained
popularity due to its simplicity, flexibility, and off-policy nature. However,
we will show that current RCRL approaches are fundamentally limited and fail to
address two critical challenges of RCRL -- improving generalization on high
reward-to-go (RTG) inputs, and avoiding out-of-distribution (OOD) RTG queries
during testing time. To address these challenges when training vanilla RCRL
architectures, we propose Bayesian Reparameterized RCRL (BR-RCRL), a novel set
of inductive biases for RCRL inspired by Bayes' theorem. BR-RCRL removes a core
obstacle preventing vanilla RCRL from generalizing on high RTG inputs -- a
tendency that the model treats different RTG inputs as independent values,
which we term ``RTG Independence". BR-RCRL also allows us to design an
accompanying adaptive inference method, which maximizes total returns while
avoiding OOD queries that yield unpredictable behaviors in vanilla RCRL
methods. We show that BR-RCRL achieves state-of-the-art performance on the
Gym-Mujoco and Atari offline RL benchmarks, improving upon vanilla RCRL by up
to 11%.
Related papers
- RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation [40.84214941048131]
RICE is an innovative refining scheme for reinforcement learning.
It incorporates explanation methods to break through the training bottlenecks.
We evaluate RICE in various popular RL environments and real-world applications.
arXiv Detail & Related papers (2024-05-05T22:06:42Z) - ReRoGCRL: Representation-based Robustness in Goal-Conditioned
Reinforcement Learning [29.868059421372244]
Goal-Conditioned Reinforcement Learning (GCRL) has gained attention, but its algorithmic robustness against adversarial perturbations remains unexplored.
We first propose the Semi-Contrastive Representation attack, inspired by the adversarial contrastive attack.
We then introduce Adversarial Representation Tactics, which combines Semi-Contrastive Adversarial Augmentation with Sensitivity-Aware Regularizer.
arXiv Detail & Related papers (2023-12-12T16:05:55Z) - RORL: Robust Offline Reinforcement Learning via Conservative Smoothing [72.8062448549897]
offline reinforcement learning can exploit the massive amount of offline data for complex decision-making tasks.
Current offline RL algorithms are generally designed to be conservative for value estimation and action selection.
We propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique.
arXiv Detail & Related papers (2022-06-06T18:07:41Z) - When does return-conditioned supervised learning work for offline
reinforcement learning? [51.899892382786526]
We study the capabilities and limitations of return-conditioned supervised learning.
We find that RCSL returns the optimal policy under a set of assumptions stronger than those needed for the more traditional dynamic programming-based algorithms.
arXiv Detail & Related papers (2022-06-02T15:05:42Z) - Robust Reinforcement Learning as a Stackelberg Game via
Adaptively-Regularized Adversarial Training [43.97565851415018]
Robust Reinforcement Learning (RL) focuses on improving performances under model errors or adversarial attacks.
Most of the existing literature models RARL as a zero-sum simultaneous game with Nash equilibrium as the solution concept.
We introduce a novel hierarchical formulation of robust RL - a general-sum Stackelberg game model called RRL-Stack.
arXiv Detail & Related papers (2022-02-19T03:44:05Z) - Contextualize Me -- The Case for Context in Reinforcement Learning [49.794253971446416]
Contextual Reinforcement Learning (cRL) provides a framework to model such changes in a principled manner.
We show how cRL contributes to improving zero-shot generalization in RL through meaningful benchmarks and structured reasoning about generalization tasks.
arXiv Detail & Related papers (2022-02-09T15:01:59Z) - A Simple Reward-free Approach to Constrained Reinforcement Learning [33.813302183231556]
This paper bridges reward-free RL and constrained RL. Particularly, we propose a simple meta-algorithm such that given any reward-free RL oracle, the approachability and constrained RL problems can be directly solved with negligible overheads in sample complexity.
arXiv Detail & Related papers (2021-07-12T06:27:30Z) - Combining Pessimism with Optimism for Robust and Efficient Model-Based
Deep Reinforcement Learning [56.17667147101263]
In real-world tasks, reinforcement learning agents encounter situations that are not present during training time.
To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations.
We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem.
arXiv Detail & Related papers (2021-03-18T16:50:17Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.