Risk Perspective Exploration in Distributional Reinforcement Learning
- URL: http://arxiv.org/abs/2206.14170v1
- Date: Tue, 28 Jun 2022 17:37:34 GMT
- Title: Risk Perspective Exploration in Distributional Reinforcement Learning
- Authors: Jihwan Oh, Joonkee Kim, Se-Young Yun
- Abstract summary: We present risk scheduling approaches that explore risk levels and optimistic behaviors from a risk perspective.
We demonstrate the performance enhancement of the DMIX algorithm using risk scheduling in a multi-agent setting.
- Score: 10.441880303257468
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distributional reinforcement learning demonstrates state-of-the-art
performance in continuous and discrete control settings with the features of
variance and risk, which can be used to explore. However, the exploration
method employing the risk property is hard to find, although numerous
exploration methods in Distributional RL employ the variance of return
distribution per action. In this paper, we present risk scheduling approaches
that explore risk levels and optimistic behaviors from a risk perspective. We
demonstrate the performance enhancement of the DMIX algorithm using risk
scheduling in a multi-agent setting with comprehensive experiments.
Related papers
- Data-Adaptive Tradeoffs among Multiple Risks in Distribution-Free Prediction [55.77015419028725]
We develop methods that permit valid control of risk when threshold and tradeoff parameters are chosen adaptively.
Our methodology supports monotone and nearly-monotone risks, but otherwise makes no distributional assumptions.
arXiv Detail & Related papers (2024-03-28T17:28:06Z) - Provable Risk-Sensitive Distributional Reinforcement Learning with
General Function Approximation [54.61816424792866]
We introduce a general framework on Risk-Sensitive Distributional Reinforcement Learning (RS-DisRL), with static Lipschitz Risk Measures (LRM) and general function approximation.
We design two innovative meta-algorithms: textttRS-DisRL-M, a model-based strategy for model-based function approximation, and textttRS-DisRL-V, a model-free approach for general value function approximation.
arXiv Detail & Related papers (2024-02-28T08:43:18Z) - Pitfall of Optimism: Distributional Reinforcement Learning by
Randomizing Risk Criterion [9.35556128467037]
We present a novel distributional reinforcement learning algorithm that selects actions by randomizing risk criterion to avoid one-sided tendency on risk.
Our theoretical results support that the proposed method does not fall into biased exploration and is guaranteed to converge to an optimal return.
arXiv Detail & Related papers (2023-10-25T10:53:04Z) - Extreme Risk Mitigation in Reinforcement Learning using Extreme Value
Theory [10.288413564829579]
A critical aspect of risk awareness involves modeling highly rare risk events (rewards) that could potentially lead to catastrophic outcomes.
While risk-aware RL techniques do exist, their level of risk aversion heavily relies on the precision of the state-action value function estimation.
Our work proposes to enhance the resilience of RL agents when faced with very rare and risky events by focusing on refining the predictions of the extreme values predicted by the state-action value function distribution.
arXiv Detail & Related papers (2023-08-24T18:23:59Z) - Provably Efficient Iterated CVaR Reinforcement Learning with Function
Approximation and Human Feedback [57.6775169085215]
Risk-sensitive reinforcement learning aims to optimize policies that balance the expected reward and risk.
We present a novel framework that employs an Iterated Conditional Value-at-Risk (CVaR) objective under both linear and general function approximations.
We propose provably sample-efficient algorithms for this Iterated CVaR RL and provide rigorous theoretical analysis.
arXiv Detail & Related papers (2023-07-06T08:14:54Z) - Toward Risk-based Optimistic Exploration for Cooperative Multi-Agent
Reinforcement Learning [9.290757451344673]
We present a risk-based exploration that leads to collaboratively optimistic behavior by shifting the sampling region of distribution.
Our method shows remarkable performance in multi-agent settings requiring cooperative exploration based on quantile regression.
arXiv Detail & Related papers (2023-03-03T08:17:57Z) - A Survey of Risk-Aware Multi-Armed Bandits [84.67376599822569]
We review various risk measures of interest, and comment on their properties.
We consider algorithms for the regret minimization setting, where the exploration-exploitation trade-off manifests.
We conclude by commenting on persisting challenges and fertile areas for future research.
arXiv Detail & Related papers (2022-05-12T02:20:34Z) - Efficient Risk-Averse Reinforcement Learning [79.61412643761034]
In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns.
We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it.
We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks.
arXiv Detail & Related papers (2022-05-10T19:40:52Z) - Surveillance Evasion Through Bayesian Reinforcement Learning [78.79938727251594]
We consider a 2D continuous path planning problem with a completely unknown intensity of random termination.
Those Observers' surveillance intensity is a priori unknown and has to be learned through repetitive path planning.
arXiv Detail & Related papers (2021-09-30T02:29:21Z) - SENTINEL: Taming Uncertainty with Ensemble-based Distributional
Reinforcement Learning [6.587644069410234]
We consider risk-sensitive sequential decision-making in model-based reinforcement learning (RL)
We introduce a novel quantification of risk, namely emphcomposite risk
We experimentally verify that SENTINEL-K estimates the return distribution better, and while used with composite risk estimate, demonstrates better risk-sensitive performance than competing RL algorithms.
arXiv Detail & Related papers (2021-02-22T14:45:39Z) - Risk-Averse Learning by Temporal Difference Methods [5.33024001730262]
We consider reinforcement learning with performance evaluated by a dynamic risk measure.
We propose risk-averse counterparts of the methods of temporal differences and we prove their convergence with probability one.
arXiv Detail & Related papers (2020-03-02T11:48:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.