Related papers: Target Entropy Annealing for Discrete Soft Actor-Critic

Target Entropy Annealing for Discrete Soft Actor-Critic

URL: http://arxiv.org/abs/2112.02852v1
Date: Mon, 6 Dec 2021 08:21:27 GMT
Title: Target Entropy Annealing for Discrete Soft Actor-Critic
Authors: Yaosheng Xu and Dailin Hu and Litian Liang and Stephen McAleer and Pieter Abbeel and Roy Fox
Abstract summary: Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm for continuous action settings. It is counter-intuitive that empirical evidence shows SAC does not perform well in discrete domains. We propose Target Entropy Scheduled SAC (TES-SAC), an annealing method for the target entropy parameter applied on SAC. We compare our method on Atari 2600 games with different constant target entropy SAC, and analyze on how our scheduling affects SAC.
Score: 64.71285903492183
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in continuous action space settings. It uses the maximum entropy framework for efficiency and stability, and applies a heuristic temperature Lagrange term to tune the temperature $\alpha$, which determines how "soft" the policy should be. It is counter-intuitive that empirical evidence shows SAC does not perform well in discrete domains. In this paper we investigate the possible explanations for this phenomenon and propose Target Entropy Scheduled SAC (TES-SAC), an annealing method for the target entropy parameter applied on SAC. Target entropy is a constant in the temperature Lagrange term and represents the target policy entropy in discrete SAC. We compare our method on Atari 2600 games with different constant target entropy SAC, and analyze on how our scheduling affects SAC.

Related papers

Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy. Many algorithms for IRL have an inherently nested structure. We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z)
Revisiting Discrete Soft Actor-Critic [42.88653969438699]
We study the adaption of Soft Actor-Critic (SAC), which is considered as a state-of-the-art reinforcement learning (RL) algorithm. We propose Stable Discrete SAC (SDSAC), an algorithm that leverages entropy-penalty and double average Q-learning with Q-clip to address these issues.
arXiv Detail & Related papers (2022-09-21T03:01:36Z)
Soft Actor-Critic with Cross-Entropy Policy Optimization [0.45687771576879593]
We propose Soft Actor-Critic with Cross-Entropy Policy Optimization (SAC-CEPO) SAC-CEPO uses Cross-Entropy Method (CEM) to optimize the policy network of SAC. We show that SAC-CEPO achieves competitive performance against the original SAC.
arXiv Detail & Related papers (2021-12-21T11:38:12Z)
Continuous-Time Fitted Value Iteration for Robust Policies [93.25997466553929]
Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics. We propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI) These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems.
arXiv Detail & Related papers (2021-10-05T11:33:37Z)
Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples with On-Policy Experience [9.06635747612495]
Soft Actor-Critic (SAC) is an off-policy actor-critic reinforcement learning algorithm. SAC trains a policy by maximizing the trade-off between expected return and entropy. It has achieved state-of-the-art performance on a range of continuous-control benchmark tasks.
arXiv Detail & Related papers (2021-09-24T06:46:28Z)
Maximum Entropy Reinforcement Learning with Mixture Policies [54.291331971813364]
We construct a tractable approximation of the mixture entropy using MaxEnt algorithms. We show that it is closely related to the sum of marginal entropies. We derive an algorithmic variant of Soft Actor-Critic (SAC) to the mixture policy case and evaluate it on a series of continuous control tasks.
arXiv Detail & Related papers (2021-03-18T11:23:39Z)
Meta-SAC: Auto-tune the Entropy Temperature of Soft Actor-Critic via Metagradient [5.100592488212484]
Our method is built upon the Soft Actor-Critic (SAC) algorithm, which uses an "entropy temperature" that balances the original task reward and the policy entropy. We show that Meta-SAC achieves promising performances on several of the Mujoco benchmarking tasks, and outperforms SAC-v2 over 10% in one of the most challenging tasks.
arXiv Detail & Related papers (2020-07-03T20:26:50Z)
Band-limited Soft Actor Critic Model [15.11069042369131]
Soft Actor Critic (SAC) algorithms show remarkable performance in complex simulated environments. We take this idea one step further by artificially bandlimiting the target critic spatial resolution. We derive the closed form solution in the linear case and show that bandlimiting reduces the interdependency between the low frequency components of the state-action value approximation.
arXiv Detail & Related papers (2020-06-19T22:52:43Z)
Provably Efficient Safe Exploration via Primal-Dual Policy Optimization [105.7510838453122]
We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation. We present an provably efficient online policy optimization algorithm for CMDP with safe exploration in the function approximation setting.
arXiv Detail & Related papers (2020-03-01T17:47:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.