Related papers: Tractable Representations for Convergent Approximation of Distributional HJB Equations

Tractable Representations for Convergent Approximation of Distributional HJB Equations

URL: http://arxiv.org/abs/2503.05563v1
Date: Fri, 07 Mar 2025 16:43:25 GMT
Title: Tractable Representations for Convergent Approximation of Distributional HJB Equations
Authors: Julie Alhosh, Harley Wiltzer, David Meger,
Abstract summary: In reinforcement learning (RL), the long-term behavior of decision-making policies is evaluated based on their average returns.<n>Recent work has established a distributional RL equation, providing the first characterization of return distributions.<n>We show that under a certain topological property of the mapping between statistics learned by a distributional RL algorithm and corresponding distributions, approximation of these statistics leads to close approximations of the solution of the DHJB equation.
Score: 14.04742317470728
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In reinforcement learning (RL), the long-term behavior of decision-making policies is evaluated based on their average returns. Distributional RL has emerged, presenting techniques for learning return distributions, which provide additional statistics for evaluating policies, incorporating risk-sensitive considerations. When the passage of time cannot naturally be divided into discrete time increments, researchers have studied the continuous-time RL (CTRL) problem, where agent states and decisions evolve continuously. In this setting, the Hamilton-Jacobi-Bellman (HJB) equation is well established as the characterization of the expected return, and many solution methods exist. However, the study of distributional RL in the continuous-time setting is in its infancy. Recent work has established a distributional HJB (DHJB) equation, providing the first characterization of return distributions in CTRL. These equations and their solutions are intractable to solve and represent exactly, requiring novel approximation techniques. This work takes strides towards this end, establishing conditions on the method of parameterizing return distributions under which the DHJB equation can be approximately solved. Particularly, we show that under a certain topological property of the mapping between statistics learned by a distributional RL algorithm and corresponding distributions, approximation of these statistics leads to close approximations of the solution of the DHJB equation. Concretely, we demonstrate that the quantile representation common in distributional RL satisfies this topological property, certifying an efficient approximation algorithm for continuous-time distributional RL.

Related papers

Discrete Probabilistic Inference as Control in Multi-path Environments [84.67055173040107]
We consider the problem of sampling from a discrete and structured distribution as a sequential decision problem. We show that GFlowNets learn a policy that samples objects proportionally to their reward by enforcing a conservation of flows. We also prove that some flow-matching objectives found in the GFlowNet literature are in fact equivalent to well-established MaxEnt RL algorithms with a corrected reward.
arXiv Detail & Related papers (2024-02-15T20:20:35Z)
Distributed Markov Chain Monte Carlo Sampling based on the Alternating Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers. We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art. In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z)
One-Step Distributional Reinforcement Learning [10.64435582017292]
We present the simpler one-step distributional reinforcement learning (OS-DistrRL) framework. We show that our approach comes with a unified theory for both policy evaluation and control. We propose two OS-DistrRL algorithms for which we provide an almost sure convergence analysis.
arXiv Detail & Related papers (2023-04-27T06:57:00Z)
Policy Evaluation in Distributional LQR [70.63903506291383]
We provide a closed-form expression of the distribution of the random return. We show that this distribution can be approximated by a finite number of random variables. Using the approximate return distribution, we propose a zeroth-order policy gradient algorithm for risk-averse LQR.
arXiv Detail & Related papers (2023-03-23T20:27:40Z)
Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time Reinforcement Learning [39.07307690074323]
We consider the problem of predicting the distribution of returns obtained by an agent interacting in a continuous-time environment. Accurate return predictions have proven useful for determining optimal policies for risk-sensitive control, state representations, multiagent coordination, and more. We propose a tractable algorithm for approximately solving the distributional HJB based on a JKO scheme, which can be implemented in an online control algorithm.
arXiv Detail & Related papers (2022-05-24T16:33:54Z)
Distributional Reinforcement Learning for Multi-Dimensional Reward Functions [91.88969237680669]
We introduce Multi-Dimensional Distributional DQN (MD3QN) to model the joint return distribution from multiple reward sources. As a by-product of joint distribution modeling, MD3QN can capture the randomness in returns for each source of reward. In experiments, our method accurately models the joint return distribution in environments with richly correlated reward functions.
arXiv Detail & Related papers (2021-10-26T11:24:23Z)
The Benefits of Being Categorical Distributional: Uncertainty-aware Regularized Exploration in Reinforcement Learning [17.64056793687686]
We find potential superiority of distributional RL can be attributed to a derived distribution-matching entropy regularization.<n>Our study offers a new perspective from the exploration to explain the intrinsic benefits of adopting distributional learning in RL.
arXiv Detail & Related papers (2021-10-07T03:14:46Z)
Distributional Reinforcement Learning with Unconstrained Monotonic Neural Networks [7.907645828535088]
The paper introduces a methodology for learning different representations of the random return distribution. A novel distributional RL algorithm named unconstrained monotonic deep Q-network (UMDQN) is presented.
arXiv Detail & Related papers (2021-06-06T20:03:50Z)
Bayesian Distributional Policy Gradients [2.28438857884398]
Distributional Reinforcement Learning maintains the entire probability distribution of the reward-to-go, i.e. the return. Bayesian Distributional Policy Gradients (BDPG) uses adversarial training in joint-contrastive learning to estimate a variational posterior from the returns.
arXiv Detail & Related papers (2021-03-20T23:42:50Z)
Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks. Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target. Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z)
A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms [67.67377846416106]
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We show that value-based methods such as TD($lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions.
arXiv Detail & Related papers (2020-03-27T05:13:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.