Related papers: TASAC: Temporally Abstract Soft Actor-Critic for Continuous Control

TASAC: Temporally Abstract Soft Actor-Critic for Continuous Control

URL: http://arxiv.org/abs/2104.06521v1
Date: Tue, 13 Apr 2021 21:24:44 GMT
Title: TASAC: Temporally Abstract Soft Actor-Critic for Continuous Control
Authors: Haonan Yu, Wei Xu, Haichao Zhang
Abstract summary: TASAC is an off-policy RL algorithm that incorporates closed-loop temporal abstraction into the soft actor-critic framework. It has two benefits compared to traditional off-policy RL algorithms: persistent exploration and an unbiased multi-step Q operator for TD learning.
Score: 28.534585378574143
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose temporally abstract soft actor-critic (TASAC), an off-policy RL algorithm that incorporates closed-loop temporal abstraction into the soft actor-critic (SAC) framework in a simple manner. TASAC adds a second-stage binary policy to choose between the previous action and the action output by an SAC actor. It has two benefits compared to traditional off-policy RL algorithms: persistent exploration and an unbiased multi-step Q operator for TD learning. We demonstrate its advantages over several strong baselines across 5 different categories of 14 continuous control tasks, in terms of both sample efficiency and final performance. Because of its simplicity and generality, TASAC can serve as a drop-in replacement for SAC when temporal abstraction is needed.

Related papers

IL-SOAR : Imitation Learning with Soft Optimistic Actor cRitic [52.44637913176449]
This paper introduces the SOAR framework for imitation learning. It is an algorithmic template that learns a policy from expert demonstrations with a primal dual style algorithm that alternates cost and policy updates. It is shown to boost consistently the performance of imitation learning algorithms based on Soft Actor Critic such as f-IRL, ML-IRL and CSIL in several MuJoCo environments.
arXiv Detail & Related papers (2025-02-27T08:03:37Z)
Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic Learning [33.42657871152637]
Langevin Soft Actor Critic (LSAC) prioritizes enhancing critic learning through uncertainty estimation over policy optimization. LSAC outperforms or matches the performance of mainstream model-free RL algorithms for continuous control tasks. Notably, LSAC marks the first successful application of an LMC based Thompson sampling in continuous control tasks with continuous action spaces.
arXiv Detail & Related papers (2025-01-29T18:18:00Z)
Unified Active Retrieval for Retrieval Augmented Generation [69.63003043712696]
In Retrieval-Augmented Generation (RAG), retrieval is not always helpful and applying it to every instruction is sub-optimal. Existing active retrieval methods face two challenges: 1. They usually rely on a single criterion, which struggles with handling various types of instructions. They depend on specialized and highly differentiated procedures, and thus combining them makes the RAG system more complicated.
arXiv Detail & Related papers (2024-06-18T12:09:02Z)
PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control [55.81022882408587]
Temporal action abstractions, along with belief state representations, are a powerful knowledge sharing mechanism for sequential decision making. We propose a novel view that treats inducing temporal action abstractions as a sequence compression problem. We introduce an approach that combines continuous action quantization with byte pair encoding to learn powerful action abstractions.
arXiv Detail & Related papers (2024-02-16T04:55:09Z)
DSAC-T: Distributional Soft Actor-Critic with Three Refinements [31.590177154247485]
We introduce an off-policy RL algorithm called distributional soft actor-critic (DSAC) Standard DSAC has its own shortcomings, including occasionally unstable learning processes and the necessity for task-specific reward scaling. This paper introduces three important refinements to standard DSAC in order to address these shortcomings.
arXiv Detail & Related papers (2023-10-09T16:52:48Z)
Soft Decomposed Policy-Critic: Bridging the Gap for Effective Continuous Control with Discrete RL [47.80205106726076]
We present the Soft Decomposed Policy-Critic (SDPC) architecture, which combines soft RL and actor-critic techniques with discrete RL methods to overcome this limitation. SDPC discretizes each action dimension independently and employs a shared critic network to maximize the soft $Q$-function. Our proposed approach outperforms state-of-the-art continuous RL algorithms in a variety of continuous control tasks, including Mujoco's Humanoid and Box2d's BiWalker.
arXiv Detail & Related papers (2023-08-20T08:32:11Z)
RLSAC: Reinforcement Learning enhanced Sample Consensus for End-to-End Robust Estimation [74.47709320443998]
We propose RLSAC, a novel Reinforcement Learning enhanced SAmple Consensus framework for end-to-end robust estimation. RLSAC employs a graph neural network to utilize both data and memory features to guide exploring directions for sampling the next minimum set. Our experimental results demonstrate that RLSAC can learn from features to gradually explore a better hypothesis.
arXiv Detail & Related papers (2023-08-10T03:14:19Z)
PAC-Bayesian Soft Actor-Critic Learning [9.752336113724928]
Actor-critic algorithms address the dual goals of reinforcement learning (RL), policy evaluation and improvement via two separate function approximators. We tackle this bottleneck by employing an existing Probably Approximately Correct (PAC) Bayesian bound for the first time as the critic training objective of the Soft Actor-Critic (SAC) algorithm.
arXiv Detail & Related papers (2023-01-30T10:44:15Z)
Revisiting Discrete Soft Actor-Critic [42.88653969438699]
We study the adaption of Soft Actor-Critic (SAC), which is considered as a state-of-the-art reinforcement learning (RL) algorithm. We propose Stable Discrete SAC (SDSAC), an algorithm that leverages entropy-penalty and double average Q-learning with Q-clip to address these issues.
arXiv Detail & Related papers (2022-09-21T03:01:36Z)
Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality [131.45028999325797]
We develop a doubly robust off-policy AC (DR-Off-PAC) for discounted MDP. DR-Off-PAC adopts a single timescale structure, in which both actor and critics are updated simultaneously with constant stepsize. We study the finite-time convergence rate and characterize the sample complexity for DR-Off-PAC to attain an $epsilon$-accurate optimal policy.
arXiv Detail & Related papers (2021-02-23T18:56:13Z)
OPAC: Opportunistic Actor-Critic [0.0]
Opportunistic Actor-Critic (OPAC) is a novel model-free deep RL algorithm that employs better exploration policy and lesser variance. OPAC combines some of the most powerful features of TD3 and SAC and aims to optimize a policy in an off-policy way.
arXiv Detail & Related papers (2020-12-11T18:33:35Z)
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
Band-limited Soft Actor Critic Model [15.11069042369131]
Soft Actor Critic (SAC) algorithms show remarkable performance in complex simulated environments. We take this idea one step further by artificially bandlimiting the target critic spatial resolution. We derive the closed form solution in the linear case and show that bandlimiting reduces the interdependency between the low frequency components of the state-action value approximation.
arXiv Detail & Related papers (2020-06-19T22:52:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.