DSAC-T: Distributional Soft Actor-Critic with Three Refinements
- URL: http://arxiv.org/abs/2310.05858v4
- Date: Thu, 28 Dec 2023 14:08:23 GMT
- Title: DSAC-T: Distributional Soft Actor-Critic with Three Refinements
- Authors: Jingliang Duan, Wenxuan Wang, Liming Xiao, Jiaxin Gao, and Shengbo
Eben Li
- Abstract summary: We introduce an off-policy RL algorithm called distributional soft actor-critic (DSAC)
Standard DSAC has its own shortcomings, including occasionally unstable learning processes and the necessity for task-specific reward scaling.
This paper introduces three important refinements to standard DSAC in order to address these shortcomings.
- Score: 31.590177154247485
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) has proven to be highly effective in tackling
complex decision-making and control tasks. However, prevalent model-free RL
methods often face severe performance degradation due to the well-known
overestimation issue. In response to this problem, we recently introduced an
off-policy RL algorithm, called distributional soft actor-critic (DSAC or
DSAC-v1), which can effectively improve the value estimation accuracy by
learning a continuous Gaussian value distribution. Nonetheless, standard DSAC
has its own shortcomings, including occasionally unstable learning processes
and the necessity for task-specific reward scaling, which may hinder its
overall performance and adaptability in some special tasks. This paper further
introduces three important refinements to standard DSAC in order to address
these shortcomings. These refinements consist of expected value substituting,
twin value distribution learning, and variance-based critic gradient adjusting.
The modified RL algorithm is named as DSAC with three refinements (DSAC-T or
DSAC-v2), and its performances are systematically evaluated on a diverse set of
benchmark tasks. Without any task-specific hyperparameter tuning, DSAC-T
surpasses or matches a lot of mainstream model-free RL algorithms, including
SAC, TD3, DDPG, TRPO, and PPO, in all tested environments. Additionally,
DSAC-T, unlike its standard version, ensures a highly stable learning process
and delivers similar performance across varying reward scales.
Related papers
- Adversarially Trained Weighted Actor-Critic for Safe Offline Reinforcement Learning [9.94248417157713]
We propose WSAC, a novel algorithm for Safe Offline Reinforcement Learning (RL) under functional approximation.
WSAC is designed as a two-player Stackelberg game to optimize a refined objective function.
arXiv Detail & Related papers (2024-01-01T01:44:58Z) - RLSAC: Reinforcement Learning enhanced Sample Consensus for End-to-End
Robust Estimation [74.47709320443998]
We propose RLSAC, a novel Reinforcement Learning enhanced SAmple Consensus framework for end-to-end robust estimation.
RLSAC employs a graph neural network to utilize both data and memory features to guide exploring directions for sampling the next minimum set.
Our experimental results demonstrate that RLSAC can learn from features to gradually explore a better hypothesis.
arXiv Detail & Related papers (2023-08-10T03:14:19Z) - On Pitfalls of Test-Time Adaptation [82.8392232222119]
Test-Time Adaptation (TTA) has emerged as a promising approach for tackling the robustness challenge under distribution shifts.
We present TTAB, a test-time adaptation benchmark that encompasses ten state-of-the-art algorithms, a diverse array of distribution shifts, and two evaluation protocols.
arXiv Detail & Related papers (2023-06-06T09:35:29Z) - Evolving Pareto-Optimal Actor-Critic Algorithms for Generalizability and
Stability [67.8426046908398]
Generalizability and stability are two key objectives for operating reinforcement learning (RL) agents in the real world.
This paper presents MetaPG, an evolutionary method for automated design of actor-critic loss functions.
arXiv Detail & Related papers (2022-04-08T20:46:16Z) - URLB: Unsupervised Reinforcement Learning Benchmark [82.36060735454647]
We introduce the Unsupervised Reinforcement Learning Benchmark (URLB)
URLB consists of two phases: reward-free pre-training and downstream task adaptation with extrinsic rewards.
We provide twelve continuous control tasks from three domains for evaluation and open-source code for eight leading unsupervised RL methods.
arXiv Detail & Related papers (2021-10-28T15:07:01Z) - Deep Reinforcement Learning-based UAV Navigation and Control: A Soft
Actor-Critic with Hindsight Experience Replay Approach [0.9137554315375919]
We propose SACHER (soft actor-critic (SAC) with hindsight experience replay (HER)) as a class of deep reinforcement learning (DRL) algorithms.
We show that SACHER achieves the desired optimal outcomes faster and more accurately than SAC, since HER improves the sample efficiency of SAC.
We apply SACHER to the navigation and control problem of unmanned aerial vehicles (UAVs), where SACHER generates the optimal navigation path.
arXiv Detail & Related papers (2021-06-02T08:30:14Z) - Low-Precision Reinforcement Learning [63.930246183244705]
Low-precision training has become a popular approach to reduce computation time, memory footprint, and energy consumption in supervised learning.
In this paper we consider continuous control with the state-of-the-art SAC agent and demonstrate that a na"ive adaptation of low-precision methods from supervised learning fails.
arXiv Detail & Related papers (2021-02-26T16:16:28Z) - OPAC: Opportunistic Actor-Critic [0.0]
Opportunistic Actor-Critic (OPAC) is a novel model-free deep RL algorithm that employs better exploration policy and lesser variance.
OPAC combines some of the most powerful features of TD3 and SAC and aims to optimize a policy in an off-policy way.
arXiv Detail & Related papers (2020-12-11T18:33:35Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - DSAC: Distributional Soft Actor Critic for Risk-Sensitive Reinforcement
Learning [21.75934236018373]
We present a new reinforcement learning algorithm called Distributional Soft Actor Critic (DSAC)
DSAC exploits the distributional information of accumulated rewards to achieve better performance.
Our experiments demonstrate that with distribution modeling in RL, the agent performs better for both risk-averse and risk-seeking control tasks.
arXiv Detail & Related papers (2020-04-30T02:23:15Z) - Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for
Addressing Value Estimation Errors [13.534873779043478]
We present a distributional soft actor-critic (DSAC) algorithm to improve the policy performance by mitigating Q-value overestimations.
We evaluate DSAC on the suite of MuJoCo continuous control tasks, achieving the state-of-the-art performance.
arXiv Detail & Related papers (2020-01-09T02:27:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.