Conjugated Discrete Distributions for Distributional Reinforcement
Learning
- URL: http://arxiv.org/abs/2112.07424v1
- Date: Tue, 14 Dec 2021 14:14:49 GMT
- Title: Conjugated Discrete Distributions for Distributional Reinforcement
Learning
- Authors: Bj\"orn Lindenberg, Jonas Nordqvist, Karl-Olof Lindahl
- Abstract summary: We show that one of the most successful methods may not yield an optimal policy if we have a non-deterministic process.
We argue that distributional reinforcement learning lends itself to remedy this situation completely.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this work we continue to build upon recent advances in reinforcement
learning for finite Markov processes. A common approach among previous existing
algorithms, both single-actor and distributed, is to either clip rewards or to
apply a transformation method on Q-functions to handle a large variety of
magnitudes in real discounted returns. We theoretically show that one of the
most successful methods may not yield an optimal policy if we have a
non-deterministic process. As a solution, we argue that distributional
reinforcement learning lends itself to remedy this situation completely. By the
introduction of a conjugated distributional operator we may handle a large
class of transformations for real returns with guaranteed theoretical
convergence. We propose an approximating single-actor algorithm based on this
operator that trains agents directly on unaltered rewards using a proper
distributional metric given by the Cram\'er distance. To evaluate its
performance in a stochastic setting we train agents on a suite of 55 Atari 2600
games using sticky-actions and obtain state-of-the-art performance compared to
other well-known algorithms in the Dopamine framework.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - Federated Learning for Heterogeneous Bandits with Unobserved Contexts [0.0]
We study the problem of federated multi-arm contextual bandits with unknown contexts.
We propose an elimination-based algorithm and prove the regret bound for linearly parametrized reward functions.
arXiv Detail & Related papers (2023-03-29T22:06:24Z) - Cooperative Distribution Alignment via JSD Upper Bound [7.071749623370137]
Unsupervised distribution alignment estimates a transformation that maps two or more source distributions to a shared aligned distribution.
This task has many applications including generative modeling, unsupervised domain adaptation, and socially aware learning.
We propose to unify and generalize previous flow-based approaches under a single non-adversarial framework.
arXiv Detail & Related papers (2022-07-05T20:09:03Z) - Dense Unsupervised Learning for Video Segmentation [49.46930315961636]
We present a novel approach to unsupervised learning for video object segmentation (VOS)
Unlike previous work, our formulation allows to learn dense feature representations directly in a fully convolutional regime.
Our approach exceeds the segmentation accuracy of previous work despite using significantly less training data and compute power.
arXiv Detail & Related papers (2021-11-11T15:15:11Z) - Bayesian Distributional Policy Gradients [2.28438857884398]
Distributional Reinforcement Learning maintains the entire probability distribution of the reward-to-go, i.e. the return.
Bayesian Distributional Policy Gradients (BDPG) uses adversarial training in joint-contrastive learning to estimate a variational posterior from the returns.
arXiv Detail & Related papers (2021-03-20T23:42:50Z) - Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks.
Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target.
Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z) - Beyond the Mean-Field: Structured Deep Gaussian Processes Improve the
Predictive Uncertainties [12.068153197381575]
We propose a novel variational family that allows for retaining covariances between latent processes while achieving fast convergence.
We provide an efficient implementation of our new approach and apply it to several benchmark datasets.
It yields excellent results and strikes a better balance between accuracy and calibrated uncertainty estimates than its state-of-the-art alternatives.
arXiv Detail & Related papers (2020-05-22T11:10:59Z) - A Distributional Analysis of Sampling-Based Reinforcement Learning
Algorithms [67.67377846416106]
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes.
We show that value-based methods such as TD($lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions.
arXiv Detail & Related papers (2020-03-27T05:13:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.