Policy Evaluation in Distributional LQR
- URL: http://arxiv.org/abs/2303.13657v1
- Date: Thu, 23 Mar 2023 20:27:40 GMT
- Title: Policy Evaluation in Distributional LQR
- Authors: Zifan Wang, Yulong Gao, Siyi Wang, Michael M. Zavlanos, Alessandro
Abate and Karl H. Johansson
- Abstract summary: We provide a closed-form expression of the distribution of the random return.
We show that this distribution can be approximated by a finite number of random variables.
Using the approximate return distribution, we propose a zeroth-order policy gradient algorithm for risk-averse LQR.
- Score: 70.63903506291383
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Distributional reinforcement learning (DRL) enhances the understanding of the
effects of the randomness in the environment by letting agents learn the
distribution of a random return, rather than its expected value as in standard
RL. At the same time, a main challenge in DRL is that policy evaluation in DRL
typically relies on the representation of the return distribution, which needs
to be carefully designed. In this paper, we address this challenge for a
special class of DRL problems that rely on linear quadratic regulator (LQR) for
control, advocating for a new distributional approach to LQR, which we call
\emph{distributional LQR}. Specifically, we provide a closed-form expression of
the distribution of the random return which, remarkably, is applicable to all
exogenous disturbances on the dynamics, as long as they are independent and
identically distributed (i.i.d.). While the proposed exact return distribution
consists of infinitely many random variables, we show that this distribution
can be approximated by a finite number of random variables, and the associated
approximation error can be analytically bounded under mild assumptions. Using
the approximate return distribution, we propose a zeroth-order policy gradient
algorithm for risk-averse LQR using the Conditional Value at Risk (CVaR) as a
measure of risk. Numerical experiments are provided to illustrate our
theoretical results.
Related papers
- EX-DRL: Hedging Against Heavy Losses with EXtreme Distributional Reinforcement Learning [16.972097006411147]
A common approach in Distributional Reinforcement Learning (DRL) involves learning the quantiles of loss distributions at specified levels using Quantile Regression (QR)
This method is particularly effective in option hedging due to its direct quantile-based risk assessment, such as Value at Risk (VaR) and Conditional Value at Risk (CVaR)
arXiv Detail & Related papers (2024-08-22T14:41:49Z) - Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence [15.720824593964027]
Risk-sensitive reinforcement learning (RL) is crucial for maintaining reliable performance in high-stakes applications.
This paper introduces a policy gradient method for risk-sensitive DRL with general coherent risk measures.
We also design a categorical distributional policy gradient algorithm (CDPG) based on categorical distributional policy evaluation and trajectory gradient estimation.
arXiv Detail & Related papers (2024-05-23T16:16:58Z) - Discrete Probabilistic Inference as Control in Multi-path Environments [84.67055173040107]
We consider the problem of sampling from a discrete and structured distribution as a sequential decision problem.
We show that GFlowNets learn a policy that samples objects proportionally to their reward by enforcing a conservation of flows.
We also prove that some flow-matching objectives found in the GFlowNet literature are in fact equivalent to well-established MaxEnt RL algorithms with a corrected reward.
arXiv Detail & Related papers (2024-02-15T20:20:35Z) - More Benefits of Being Distributional: Second-Order Bounds for
Reinforcement Learning [58.626683114119906]
We show that Distributional Reinforcement Learning (DistRL) can obtain second-order bounds in both online and offline RL.
Our results are the first second-order bounds for low-rank MDPs and for offline RL.
arXiv Detail & Related papers (2024-02-11T13:25:53Z) - Risk-Sensitive Policy with Distributional Reinforcement Learning [4.523089386111081]
This research work introduces a novel methodology based on distributional RL to derive sequential decision-making policies sensitive to the risk.
Named the risk-based utility function $U$, it can be extracted from the random return distribution $Z$ naturally learnt by any distributional RL algorithm.
This enables to span the complete potential trade-off between risk minimisation and expected return maximisation.
arXiv Detail & Related papers (2022-12-30T14:37:28Z) - Distributional Reinforcement Learning for Multi-Dimensional Reward
Functions [91.88969237680669]
We introduce Multi-Dimensional Distributional DQN (MD3QN) to model the joint return distribution from multiple reward sources.
As a by-product of joint distribution modeling, MD3QN can capture the randomness in returns for each source of reward.
In experiments, our method accurately models the joint return distribution in environments with richly correlated reward functions.
arXiv Detail & Related papers (2021-10-26T11:24:23Z) - Conservative Offline Distributional Reinforcement Learning [34.95001490294207]
We propose Conservative Offline Distributional Actor Critic (CODAC) for both risk-neutral and risk-averse domains.
CODAC adapts distributional RL to the offline setting by penalizing the predicted quantiles of the return for out-of-distribution actions.
In experiments, CODAC successfully learns risk-averse policies using offline data collected purely from risk-neutral agents.
arXiv Detail & Related papers (2021-07-12T15:38:06Z) - Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks.
Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target.
Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.