TASAC: Temporally Abstract Soft Actor-Critic for Continuous Control
- URL: http://arxiv.org/abs/2104.06521v1
- Date: Tue, 13 Apr 2021 21:24:44 GMT
- Title: TASAC: Temporally Abstract Soft Actor-Critic for Continuous Control
- Authors: Haonan Yu, Wei Xu, Haichao Zhang
- Abstract summary: TASAC is an off-policy RL algorithm that incorporates closed-loop temporal abstraction into the soft actor-critic framework.
It has two benefits compared to traditional off-policy RL algorithms: persistent exploration and an unbiased multi-step Q operator for TD learning.
- Score: 28.534585378574143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose temporally abstract soft actor-critic (TASAC), an off-policy RL
algorithm that incorporates closed-loop temporal abstraction into the soft
actor-critic (SAC) framework in a simple manner. TASAC adds a second-stage
binary policy to choose between the previous action and the action output by an
SAC actor. It has two benefits compared to traditional off-policy RL
algorithms: persistent exploration and an unbiased multi-step Q operator for TD
learning. We demonstrate its advantages over several strong baselines across 5
different categories of 14 continuous control tasks, in terms of both sample
efficiency and final performance. Because of its simplicity and generality,
TASAC can serve as a drop-in replacement for SAC when temporal abstraction is
needed.
Related papers
- Unified Active Retrieval for Retrieval Augmented Generation [69.63003043712696]
In Retrieval-Augmented Generation (RAG), retrieval is not always helpful and applying it to every instruction is sub-optimal.
Existing active retrieval methods face two challenges: 1.
They usually rely on a single criterion, which struggles with handling various types of instructions.
They depend on specialized and highly differentiated procedures, and thus combining them makes the RAG system more complicated.
arXiv Detail & Related papers (2024-06-18T12:09:02Z) - PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control [55.81022882408587]
Temporal action abstractions, along with belief state representations, are a powerful knowledge sharing mechanism for sequential decision making.
We propose a novel view that treats inducing temporal action abstractions as a sequence compression problem.
We introduce an approach that combines continuous action quantization with byte pair encoding to learn powerful action abstractions.
arXiv Detail & Related papers (2024-02-16T04:55:09Z) - DSAC-T: Distributional Soft Actor-Critic with Three Refinements [31.590177154247485]
We introduce an off-policy RL algorithm called distributional soft actor-critic (DSAC)
Standard DSAC has its own shortcomings, including occasionally unstable learning processes and the necessity for task-specific reward scaling.
This paper introduces three important refinements to standard DSAC in order to address these shortcomings.
arXiv Detail & Related papers (2023-10-09T16:52:48Z) - Soft Decomposed Policy-Critic: Bridging the Gap for Effective Continuous
Control with Discrete RL [47.80205106726076]
We present the Soft Decomposed Policy-Critic (SDPC) architecture, which combines soft RL and actor-critic techniques with discrete RL methods to overcome this limitation.
SDPC discretizes each action dimension independently and employs a shared critic network to maximize the soft $Q$-function.
Our proposed approach outperforms state-of-the-art continuous RL algorithms in a variety of continuous control tasks, including Mujoco's Humanoid and Box2d's BiWalker.
arXiv Detail & Related papers (2023-08-20T08:32:11Z) - RLSAC: Reinforcement Learning enhanced Sample Consensus for End-to-End
Robust Estimation [74.47709320443998]
We propose RLSAC, a novel Reinforcement Learning enhanced SAmple Consensus framework for end-to-end robust estimation.
RLSAC employs a graph neural network to utilize both data and memory features to guide exploring directions for sampling the next minimum set.
Our experimental results demonstrate that RLSAC can learn from features to gradually explore a better hypothesis.
arXiv Detail & Related papers (2023-08-10T03:14:19Z) - PAC-Bayesian Soft Actor-Critic Learning [9.752336113724928]
Actor-critic algorithms address the dual goals of reinforcement learning (RL), policy evaluation and improvement via two separate function approximators.
We tackle this bottleneck by employing an existing Probably Approximately Correct (PAC) Bayesian bound for the first time as the critic training objective of the Soft Actor-Critic (SAC) algorithm.
arXiv Detail & Related papers (2023-01-30T10:44:15Z) - Deep Reinforcement Learning-based UAV Navigation and Control: A Soft
Actor-Critic with Hindsight Experience Replay Approach [0.9137554315375919]
We propose SACHER (soft actor-critic (SAC) with hindsight experience replay (HER)) as a class of deep reinforcement learning (DRL) algorithms.
We show that SACHER achieves the desired optimal outcomes faster and more accurately than SAC, since HER improves the sample efficiency of SAC.
We apply SACHER to the navigation and control problem of unmanned aerial vehicles (UAVs), where SACHER generates the optimal navigation path.
arXiv Detail & Related papers (2021-06-02T08:30:14Z) - Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality [131.45028999325797]
We develop a doubly robust off-policy AC (DR-Off-PAC) for discounted MDP.
DR-Off-PAC adopts a single timescale structure, in which both actor and critics are updated simultaneously with constant stepsize.
We study the finite-time convergence rate and characterize the sample complexity for DR-Off-PAC to attain an $epsilon$-accurate optimal policy.
arXiv Detail & Related papers (2021-02-23T18:56:13Z) - OPAC: Opportunistic Actor-Critic [0.0]
Opportunistic Actor-Critic (OPAC) is a novel model-free deep RL algorithm that employs better exploration policy and lesser variance.
OPAC combines some of the most powerful features of TD3 and SAC and aims to optimize a policy in an off-policy way.
arXiv Detail & Related papers (2020-12-11T18:33:35Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - Band-limited Soft Actor Critic Model [15.11069042369131]
Soft Actor Critic (SAC) algorithms show remarkable performance in complex simulated environments.
We take this idea one step further by artificially bandlimiting the target critic spatial resolution.
We derive the closed form solution in the linear case and show that bandlimiting reduces the interdependency between the low frequency components of the state-action value approximation.
arXiv Detail & Related papers (2020-06-19T22:52:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.