DSAC: Distributional Soft Actor Critic for Risk-Sensitive Reinforcement
Learning
- URL: http://arxiv.org/abs/2004.14547v2
- Date: Thu, 11 Jun 2020 02:08:35 GMT
- Title: DSAC: Distributional Soft Actor Critic for Risk-Sensitive Reinforcement
Learning
- Authors: Xiaoteng Ma, Li Xia, Zhengyuan Zhou, Jun Yang, Qianchuan Zhao
- Abstract summary: We present a new reinforcement learning algorithm called Distributional Soft Actor Critic (DSAC)
DSAC exploits the distributional information of accumulated rewards to achieve better performance.
Our experiments demonstrate that with distribution modeling in RL, the agent performs better for both risk-averse and risk-seeking control tasks.
- Score: 21.75934236018373
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present a new reinforcement learning (RL) algorithm called
Distributional Soft Actor Critic (DSAC), which exploits the distributional
information of accumulated rewards to achieve better performance. Seamlessly
integrating SAC (which uses entropy to encourage exploration) with a principled
distributional view of the underlying objective, DSAC takes into consideration
the randomness in both action and rewards, and beats the state-of-the-art
baselines in several continuous control benchmarks. Moreover, with the
distributional information of rewards, we propose a unified framework for
risk-sensitive learning, one that goes beyond maximizing only expected
accumulated rewards. Under this framework we discuss three specific
risk-related metrics: percentile, mean-variance and distorted expectation. Our
extensive experiments demonstrate that with distribution modeling in RL, the
agent performs better for both risk-averse and risk-seeking control tasks.
Related papers
- MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards.
We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration.
We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z) - Risk-Sensitive RL with Optimized Certainty Equivalents via Reduction to
Standard RL [48.1726560631463]
We study Risk-Sensitive Reinforcement Learning with the Optimized Certainty Equivalent (OCE) risk.
We propose two general meta-algorithms via reductions to standard RL.
We show that it learns the optimal risk-sensitive policy while prior algorithms provably fail.
arXiv Detail & Related papers (2024-03-10T21:45:12Z) - Distributional Soft Actor-Critic with Three Refinements [47.46661939652862]
Reinforcement learning (RL) has shown remarkable success in solving complex decision-making and control tasks.
Many model-free RL algorithms experience performance degradation due to inaccurate value estimation.
This paper introduces three key refinements to DSACv1 to overcome these limitations and further improve Q-value estimation accuracy.
arXiv Detail & Related papers (2023-10-09T16:52:48Z) - Risk-Aware Reinforcement Learning through Optimal Transport Theory [4.8951183832371]
This paper pioneers the integration of Optimal Transport theory with reinforcement learning (RL) to create a risk-aware framework.
Our approach modifies the objective function, ensuring that the resulting policy not only maximizes expected rewards but also respects risk constraints dictated by OT distances.
Our contributions are substantiated with a series of theorems, mapping the relationships between risk distributions, optimal value functions, and policy behaviors.
arXiv Detail & Related papers (2023-09-12T13:55:01Z) - Distributional Reward Estimation for Effective Multi-Agent Deep
Reinforcement Learning [19.788336796981685]
We propose a novel Distributional Reward Estimation framework for effective Multi-Agent Reinforcement Learning (DRE-MARL)
Our main idea is to design the multi-action-branch reward estimation and policy-weighted reward aggregation for stabilized training.
The superiority of the DRE-MARL is demonstrated using benchmark multi-agent scenarios, compared with the SOTA baselines in terms of both effectiveness and robustness.
arXiv Detail & Related papers (2022-10-14T08:31:45Z) - A Risk-Sensitive Approach to Policy Optimization [21.684251937825234]
Standard deep reinforcement learning (DRL) aims to maximize expected reward, considering collected experiences equally in formulating a policy.
We propose a more direct approach whereby risk-sensitive objectives, specified in terms of the cumulative distribution function (CDF) of the distribution of full-episode rewards, are optimized.
We demonstrate that the use of moderately "pessimistic" risk profiles, which emphasize scenarios where the agent performs poorly, leads to enhanced exploration and a continual focus on addressing deficiencies.
arXiv Detail & Related papers (2022-08-19T00:55:05Z) - Risk Perspective Exploration in Distributional Reinforcement Learning [10.441880303257468]
We present risk scheduling approaches that explore risk levels and optimistic behaviors from a risk perspective.
We demonstrate the performance enhancement of the DMIX algorithm using risk scheduling in a multi-agent setting.
arXiv Detail & Related papers (2022-06-28T17:37:34Z) - SENTINEL: Taming Uncertainty with Ensemble-based Distributional
Reinforcement Learning [6.587644069410234]
We consider risk-sensitive sequential decision-making in model-based reinforcement learning (RL)
We introduce a novel quantification of risk, namely emphcomposite risk
We experimentally verify that SENTINEL-K estimates the return distribution better, and while used with composite risk estimate, demonstrates better risk-sensitive performance than competing RL algorithms.
arXiv Detail & Related papers (2021-02-22T14:45:39Z) - Risk-Constrained Thompson Sampling for CVaR Bandits [82.47796318548306]
We consider a popular risk measure in quantitative finance known as the Conditional Value at Risk (CVaR)
We explore the performance of a Thompson Sampling-based algorithm CVaR-TS under this risk measure.
arXiv Detail & Related papers (2020-11-16T15:53:22Z) - Learning Bounds for Risk-sensitive Learning [86.50262971918276]
In risk-sensitive learning, one aims to find a hypothesis that minimizes a risk-averse (or risk-seeking) measure of loss.
We study the generalization properties of risk-sensitive learning schemes whose optimand is described via optimized certainty equivalents.
arXiv Detail & Related papers (2020-06-15T05:25:02Z) - Reinforcement Learning through Active Inference [62.997667081978825]
We show how ideas from active inference can augment traditional reinforcement learning approaches.
We develop and implement a novel objective for decision making, which we term the free energy of the expected future.
We demonstrate that the resulting algorithm successfully exploration and exploitation, simultaneously achieving robust performance on several challenging RL benchmarks with sparse, well-shaped, and no rewards.
arXiv Detail & Related papers (2020-02-28T10:28:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.