Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time
Reinforcement Learning
- URL: http://arxiv.org/abs/2205.12184v1
- Date: Tue, 24 May 2022 16:33:54 GMT
- Title: Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time
Reinforcement Learning
- Authors: Harley Wiltzer and David Meger and Marc G. Bellemare
- Abstract summary: We consider the problem of predicting the distribution of returns obtained by an agent interacting in a continuous-time environment.
Accurate return predictions have proven useful for determining optimal policies for risk-sensitive control, state representations, multiagent coordination, and more.
We propose a tractable algorithm for approximately solving the distributional HJB based on a JKO scheme, which can be implemented in an online control algorithm.
- Score: 39.07307690074323
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continuous-time reinforcement learning offers an appealing formalism for
describing control problems in which the passage of time is not naturally
divided into discrete increments. Here we consider the problem of predicting
the distribution of returns obtained by an agent interacting in a
continuous-time, stochastic environment. Accurate return predictions have
proven useful for determining optimal policies for risk-sensitive control,
learning state representations, multiagent coordination, and more. We begin by
establishing the distributional analogue of the Hamilton-Jacobi-Bellman (HJB)
equation for It\^o diffusions and the broader class of Feller-Dynkin processes.
We then specialize this equation to the setting in which the return
distribution is approximated by $N$ uniformly-weighted particles, a common
design choice in distributional algorithms. Our derivation highlights
additional terms due to statistical diffusivity which arise from the proper
handling of distributions in the continuous-time setting. Based on this, we
propose a tractable algorithm for approximately solving the distributional HJB
based on a JKO scheme, which can be implemented in an online control algorithm.
We demonstrate the effectiveness of such an algorithm in a synthetic control
problem.
Related papers
- Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Distributed Multi-Task Learning for Stochastic Bandits with Context Distribution and Stage-wise Constraints [0.0]
We propose a distributed upper confidence bound (UCB) algorithm, related-UCB.
Our algorithm constructs a pruned action set during each round to ensure the constraints are met.
We empirically validated the performance of our algorithm on synthetic data and real-world Movielens-100K data.
arXiv Detail & Related papers (2024-01-21T18:43:55Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - Optimal scheduling of entropy regulariser for continuous-time
linear-quadratic reinforcement learning [9.779769486156631]
Herein agent interacts with the environment by generating noisy controls distributed according to the optimal relaxed policy.
This exploration-exploitation trade-off is determined by the strength of entropy regularisation.
We prove that the regret, for both learning algorithms, is of the order $mathcalO(sqrtN) $ (up to a logarithmic factor) over $N$ episodes, matching the best known result from the literature.
arXiv Detail & Related papers (2022-08-08T23:36:40Z) - Distributed Stochastic Bandit Learning with Context Distributions [0.0]
We study the problem of distributed multi-arm contextual bandit with unknown contexts.
In our model, an adversary chooses a distribution on the set of possible contexts and the agents observe only the context distribution and the exact context is unknown to the agents.
Our goal is to develop a distributed algorithm that selects a sequence of optimal actions to maximize the cumulative reward.
arXiv Detail & Related papers (2022-07-28T22:00:11Z) - Decentralized Local Stochastic Extra-Gradient for Variational
Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices.
We make a very general assumption on the computational network that covers the settings of fully decentralized calculations.
We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z) - A Distributional Analysis of Sampling-Based Reinforcement Learning
Algorithms [67.67377846416106]
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes.
We show that value-based methods such as TD($lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions.
arXiv Detail & Related papers (2020-03-27T05:13:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.