A Distributional Analysis of Sampling-Based Reinforcement Learning
Algorithms
- URL: http://arxiv.org/abs/2003.12239v1
- Date: Fri, 27 Mar 2020 05:13:29 GMT
- Title: A Distributional Analysis of Sampling-Based Reinforcement Learning
Algorithms
- Authors: Philip Amortila, Doina Precup, Prakash Panangaden, Marc G. Bellemare
- Abstract summary: We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes.
We show that value-based methods such as TD($lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions.
- Score: 67.67377846416106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a distributional approach to theoretical analyses of reinforcement
learning algorithms for constant step-sizes. We demonstrate its effectiveness
by presenting simple and unified proofs of convergence for a variety of
commonly-used methods. We show that value-based methods such as TD($\lambda$)
and $Q$-Learning have update rules which are contractive in the space of
distributions of functions, thus establishing their exponentially fast
convergence to a stationary distribution. We demonstrate that the stationary
distribution obtained by any algorithm whose target is an expected Bellman
update has a mean which is equal to the true value function. Furthermore, we
establish that the distributions concentrate around their mean as the step-size
shrinks. We further analyse the optimistic policy iteration algorithm, for
which the contraction property does not hold, and formulate a probabilistic
policy improvement property which entails the convergence of the algorithm.
Related papers
- On Policy Evaluation Algorithms in Distributional Reinforcement Learning [0.0]
We introduce a novel class of algorithms to efficiently approximate the unknown return distributions in policy evaluation problems from distributional reinforcement learning (DRL)
For a plain instance of our proposed class of algorithms we prove error bounds, both within Wasserstein and Kolmogorov--Smirnov distances.
For return distributions having probability density functions the algorithms yield approximations for these densities; error bounds are given within supremum norm.
arXiv Detail & Related papers (2024-07-19T10:06:01Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time
Reinforcement Learning [39.07307690074323]
We consider the problem of predicting the distribution of returns obtained by an agent interacting in a continuous-time environment.
Accurate return predictions have proven useful for determining optimal policies for risk-sensitive control, state representations, multiagent coordination, and more.
We propose a tractable algorithm for approximately solving the distributional HJB based on a JKO scheme, which can be implemented in an online control algorithm.
arXiv Detail & Related papers (2022-05-24T16:33:54Z) - Conjugated Discrete Distributions for Distributional Reinforcement
Learning [0.0]
We show that one of the most successful methods may not yield an optimal policy if we have a non-deterministic process.
We argue that distributional reinforcement learning lends itself to remedy this situation completely.
arXiv Detail & Related papers (2021-12-14T14:14:49Z) - A Stochastic Newton Algorithm for Distributed Convex Optimization [62.20732134991661]
We analyze a Newton algorithm for homogeneous distributed convex optimization, where each machine can calculate gradients of the same population objective.
We show that our method can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance.
arXiv Detail & Related papers (2021-10-07T17:51:10Z) - Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks.
Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target.
Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z) - Study of Diffusion Normalized Least Mean M-estimate Algorithms [0.8749675983608171]
This work proposes diffusion normalized least mean M-estimate algorithm based on the modified Huber function.
We analyze the transient, steady-state and stability behaviors of the algorithms in a unified framework.
Simulations in various impulsive noise scenarios show that the proposed algorithms are superior to some existing diffusion algorithms.
arXiv Detail & Related papers (2020-04-20T00:28:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.