Related papers: Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time Reinforcement Learning

Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time Reinforcement Learning

URL: http://arxiv.org/abs/2205.12184v1
Date: Tue, 24 May 2022 16:33:54 GMT
Title: Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time Reinforcement Learning
Authors: Harley Wiltzer and David Meger and Marc G. Bellemare
Abstract summary: We consider the problem of predicting the distribution of returns obtained by an agent interacting in a continuous-time environment. Accurate return predictions have proven useful for determining optimal policies for risk-sensitive control, state representations, multiagent coordination, and more. We propose a tractable algorithm for approximately solving the distributional HJB based on a JKO scheme, which can be implemented in an online control algorithm.
Score: 39.07307690074323
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Continuous-time reinforcement learning offers an appealing formalism for describing control problems in which the passage of time is not naturally divided into discrete increments. Here we consider the problem of predicting the distribution of returns obtained by an agent interacting in a continuous-time, stochastic environment. Accurate return predictions have proven useful for determining optimal policies for risk-sensitive control, learning state representations, multiagent coordination, and more. We begin by establishing the distributional analogue of the Hamilton-Jacobi-Bellman (HJB) equation for It\^o diffusions and the broader class of Feller-Dynkin processes. We then specialize this equation to the setting in which the return distribution is approximated by $N$ uniformly-weighted particles, a common design choice in distributional algorithms. Our derivation highlights additional terms due to statistical diffusivity which arise from the proper handling of distributions in the continuous-time setting. Based on this, we propose a tractable algorithm for approximately solving the distributional HJB based on a JKO scheme, which can be implemented in an online control algorithm. We demonstrate the effectiveness of such an algorithm in a synthetic control problem.

Related papers

Tractable Representations for Convergent Approximation of Distributional HJB Equations [14.04742317470728]
In reinforcement learning (RL), the long-term behavior of decision-making policies is evaluated based on their average returns. Recent work has established a distributional RL equation, providing the first characterization of return distributions. We show that under a certain topological property of the mapping between statistics learned by a distributional RL algorithm and corresponding distributions, approximation of these statistics leads to close approximations of the solution of the DHJB equation.
arXiv Detail & Related papers (2025-03-07T16:43:25Z)
Distributed Markov Chain Monte Carlo Sampling based on the Alternating Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers. We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art. In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z)
Distributed Multi-Task Learning for Stochastic Bandits with Context Distribution and Stage-wise Constraints [0.0]
We propose a distributed upper confidence bound (UCB) algorithm, related-UCB. Our algorithm constructs a pruned action set during each round to ensure the constraints are met. We empirically validated the performance of our algorithm on synthetic data and real-world Movielens-100K data.
arXiv Detail & Related papers (2024-01-21T18:43:55Z)
Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution. We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z)
Optimal scheduling of entropy regulariser for continuous-time linear-quadratic reinforcement learning [9.779769486156631]
Herein agent interacts with the environment by generating noisy controls distributed according to the optimal relaxed policy. This exploration-exploitation trade-off is determined by the strength of entropy regularisation. We prove that the regret, for both learning algorithms, is of the order $mathcalO(sqrtN) $ (up to a logarithmic factor) over $N$ episodes, matching the best known result from the literature.
arXiv Detail & Related papers (2022-08-08T23:36:40Z)
Distributed Stochastic Bandit Learning with Context Distributions [0.0]
We study the problem of distributed multi-arm contextual bandit with unknown contexts. In our model, an adversary chooses a distribution on the set of possible contexts and the agents observe only the context distribution and the exact context is unknown to the agents. Our goal is to develop a distributed algorithm that selects a sequence of optimal actions to maximize the cumulative reward.
arXiv Detail & Related papers (2022-07-28T22:00:11Z)
Decentralized Local Stochastic Extra-Gradient for Variational Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices. We make a very general assumption on the computational network that covers the settings of fully decentralized calculations. We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z)
Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs) Semi-implicit actor (SIA) powered by a flexible policy distribution. We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z)
A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms [67.67377846416106]
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We show that value-based methods such as TD($lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions.
arXiv Detail & Related papers (2020-03-27T05:13:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.