Bayesian Distributional Policy Gradients
- URL: http://arxiv.org/abs/2103.11265v2
- Date: Tue, 23 Mar 2021 04:54:15 GMT
- Title: Bayesian Distributional Policy Gradients
- Authors: Luchen Li, A. Aldo Faisal
- Abstract summary: Distributional Reinforcement Learning maintains the entire probability distribution of the reward-to-go, i.e. the return.
Bayesian Distributional Policy Gradients (BDPG) uses adversarial training in joint-contrastive learning to estimate a variational posterior from the returns.
- Score: 2.28438857884398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distributional Reinforcement Learning (RL) maintains the entire probability
distribution of the reward-to-go, i.e. the return, providing more learning
signals that account for the uncertainty associated with policy performance,
which may be beneficial for trading off exploration and exploitation and policy
learning in general. Previous works in distributional RL focused mainly on
computing the state-action-return distributions, here we model the state-return
distributions. This enables us to translate successful conventional RL
algorithms that are based on state values into distributional RL. We formulate
the distributional Bellman operation as an inference-based auto-encoding
process that minimises Wasserstein metrics between target/model return
distributions. The proposed algorithm, BDPG (Bayesian Distributional Policy
Gradients), uses adversarial training in joint-contrastive learning to estimate
a variational posterior from the returns. Moreover, we can now interpret the
return prediction uncertainty as an information gain, which allows to obtain a
new curiosity measure that helps BDPG steer exploration actively and
efficiently. We demonstrate in a suite of Atari 2600 games and MuJoCo tasks,
including well known hard-exploration challenges, how BDPG learns generally
faster and with higher asymptotic performance than reference distributional RL
algorithms.
Related papers
- A Distributional Analogue to the Successor Representation [54.99439648059807]
This paper contributes a new approach for distributional reinforcement learning.
It elucidates a clean separation of transition structure and reward in the learning process.
As an illustration, we show that it enables zero-shot risk-sensitive policy evaluation.
arXiv Detail & Related papers (2024-02-13T15:35:24Z) - Bag of Policies for Distributional Deep Exploration [7.522221438479138]
Bag of Policies (BoP) is built on top of any return distribution estimator by maintaining a population of its copies.
During training, each episode is controlled by only one of the heads and the collected state-action pairs are used to update all heads off-policy.
BoP results in greater robustness and speed during learning as demonstrated by our experimental results on ALE Atari games.
arXiv Detail & Related papers (2023-08-03T13:43:03Z) - Distributional Reinforcement Learning with Dual Expectile-Quantile Regression [51.87411935256015]
quantile regression approach to distributional RL provides flexible and effective way of learning arbitrary return distributions.
We show that distributional guarantees vanish, and we empirically observe that the estimated distribution rapidly collapses to its mean estimation.
Motivated by the efficiency of $L$-based learning, we propose to jointly learn expectiles and quantiles of the return distribution in a way that allows efficient learning while keeping an estimate of the full distribution of returns.
arXiv Detail & Related papers (2023-05-26T12:30:05Z) - Policy Evaluation in Distributional LQR [70.63903506291383]
We provide a closed-form expression of the distribution of the random return.
We show that this distribution can be approximated by a finite number of random variables.
Using the approximate return distribution, we propose a zeroth-order policy gradient algorithm for risk-averse LQR.
arXiv Detail & Related papers (2023-03-23T20:27:40Z) - How Does Return Distribution in Distributional Reinforcement Learning Help Optimization? [10.149055921090572]
We investigate the optimization advantages of distributional RL within the Neural Fitted Z-Iteration(Neural FZI) framework.
We show that distributional RL has desirable smoothness characteristics and hence enjoys stable gradients.
Our research findings illuminate how the return distribution in distributional RL algorithms helps the optimization.
arXiv Detail & Related papers (2022-09-29T02:18:31Z) - Exploration with Multi-Sample Target Values for Distributional
Reinforcement Learning [20.680417111485305]
We introduce multi-sample target values (MTV) for distributional RL, as a principled replacement for single-sample target value estimation.
The improved distributional estimates lend themselves to UCB-based exploration.
We evaluate our approach on a range of continuous control tasks and demonstrate state-of-the-art model-free performance on difficult tasks such as Humanoid control.
arXiv Detail & Related papers (2022-02-06T03:27:05Z) - Robustness and risk management via distributional dynamic programming [13.173307471333619]
We introduce a new class of distributional operators, together with a practical DP algorithm for policy evaluation.
Our approach reformulates through an augmented state space where each state is split into a worst-case substate and a best-case substate.
We derive distributional operators and DP algorithms solving a new control task.
arXiv Detail & Related papers (2021-12-28T12:12:57Z) - Distributional Reinforcement Learning for Multi-Dimensional Reward
Functions [91.88969237680669]
We introduce Multi-Dimensional Distributional DQN (MD3QN) to model the joint return distribution from multiple reward sources.
As a by-product of joint distribution modeling, MD3QN can capture the randomness in returns for each source of reward.
In experiments, our method accurately models the joint return distribution in environments with richly correlated reward functions.
arXiv Detail & Related papers (2021-10-26T11:24:23Z) - The Benefits of Being Categorical Distributional: Uncertainty-aware
Regularized Exploration in Reinforcement Learning [18.525166928667876]
We attribute the potential superiority of distributional RL to a derived distribution-matching regularization by applying a return density function decomposition technique.
This unexplored regularization in the distributional RL context is aimed at capturing additional return distribution information regardless of only its expectation.
Tests substantiate the importance of this uncertainty-aware regularization in distributional RL on the empirical benefits over classical RL.
arXiv Detail & Related papers (2021-10-07T03:14:46Z) - Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks.
Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target.
Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.