Revisiting Discrete Soft Actor-Critic
- URL: http://arxiv.org/abs/2209.10081v4
- Date: Wed, 20 Nov 2024 13:52:42 GMT
- Title: Revisiting Discrete Soft Actor-Critic
- Authors: Haibin Zhou, Tong Wei, Zichuan Lin, junyou li, Junliang Xing, Yuanchun Shi, Li Shen, Chao Yu, Deheng Ye,
- Abstract summary: We study the adaption of Soft Actor-Critic (SAC), which is considered as a state-of-the-art reinforcement learning (RL) algorithm.
We propose Stable Discrete SAC (SDSAC), an algorithm that leverages entropy-penalty and double average Q-learning with Q-clip to address these issues.
- Score: 42.88653969438699
- License:
- Abstract: We study the adaption of Soft Actor-Critic (SAC), which is considered as a state-of-the-art reinforcement learning (RL) algorithm, from continuous action space to discrete action space. We revisit vanilla discrete SAC and provide an in-depth understanding of its Q value underestimation and performance instability issues when applied to discrete settings. We thereby propose Stable Discrete SAC (SDSAC), an algorithm that leverages entropy-penalty and double average Q-learning with Q-clip to address these issues. Extensive experiments on typical benchmarks with discrete action space, including Atari games and a large-scale MOBA game, show the efficacy of our proposed method. Our code is at: https://github.com/coldsummerday/SD-SAC.git.
Related papers
- RLSAC: Reinforcement Learning enhanced Sample Consensus for End-to-End
Robust Estimation [74.47709320443998]
We propose RLSAC, a novel Reinforcement Learning enhanced SAmple Consensus framework for end-to-end robust estimation.
RLSAC employs a graph neural network to utilize both data and memory features to guide exploring directions for sampling the next minimum set.
Our experimental results demonstrate that RLSAC can learn from features to gradually explore a better hypothesis.
arXiv Detail & Related papers (2023-08-10T03:14:19Z) - Target Entropy Annealing for Discrete Soft Actor-Critic [64.71285903492183]
Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm for continuous action settings.
It is counter-intuitive that empirical evidence shows SAC does not perform well in discrete domains.
We propose Target Entropy Scheduled SAC (TES-SAC), an annealing method for the target entropy parameter applied on SAC.
We compare our method on Atari 2600 games with different constant target entropy SAC, and analyze on how our scheduling affects SAC.
arXiv Detail & Related papers (2021-12-06T08:21:27Z) - Deep Reinforcement Learning-based UAV Navigation and Control: A Soft
Actor-Critic with Hindsight Experience Replay Approach [0.9137554315375919]
We propose SACHER (soft actor-critic (SAC) with hindsight experience replay (HER)) as a class of deep reinforcement learning (DRL) algorithms.
We show that SACHER achieves the desired optimal outcomes faster and more accurately than SAC, since HER improves the sample efficiency of SAC.
We apply SACHER to the navigation and control problem of unmanned aerial vehicles (UAVs), where SACHER generates the optimal navigation path.
arXiv Detail & Related papers (2021-06-02T08:30:14Z) - TASAC: Temporally Abstract Soft Actor-Critic for Continuous Control [28.534585378574143]
TASAC is an off-policy RL algorithm that incorporates closed-loop temporal abstraction into the soft actor-critic framework.
It has two benefits compared to traditional off-policy RL algorithms: persistent exploration and an unbiased multi-step Q operator for TD learning.
arXiv Detail & Related papers (2021-04-13T21:24:44Z) - Predictive Information Accelerates Learning in RL [50.52439807008805]
We train Soft Actor-Critic (SAC) agents from pixels with an auxiliary task that learns a compressed representation of the predictive information of the RL environment dynamics.
We show that PI-SAC agents can substantially improve sample efficiency over challenging baselines on tasks from the DM Control suite of continuous control environments.
arXiv Detail & Related papers (2020-07-24T08:14:41Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - RSAC: Regularized Subspace Approximation Classifier for Lightweight
Continuous Learning [0.9137554315375922]
Continuous learning seeks to perform the learning on the data that arrives from time to time.
In this work, a novel training algorithm, regularized subspace approximation classifier (RSAC) is proposed to achieve lightweight continuous learning.
Extensive experiments show that RSAC is more efficient than prior continuous learning works and outperforms these works on various experimental settings.
arXiv Detail & Related papers (2020-07-03T03:38:06Z) - Band-limited Soft Actor Critic Model [15.11069042369131]
Soft Actor Critic (SAC) algorithms show remarkable performance in complex simulated environments.
We take this idea one step further by artificially bandlimiting the target critic spatial resolution.
We derive the closed form solution in the linear case and show that bandlimiting reduces the interdependency between the low frequency components of the state-action value approximation.
arXiv Detail & Related papers (2020-06-19T22:52:43Z) - SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive
Connection [51.376723069962]
We present a method for accelerating and structuring self-attentions: Sparse Adaptive Connection.
In SAC, we regard the input sequence as a graph and attention operations are performed between linked nodes.
We show that SAC is competitive with state-of-the-art models while significantly reducing memory cost.
arXiv Detail & Related papers (2020-03-22T07:58:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.