Related papers: Band-limited Soft Actor Critic Model

Band-limited Soft Actor Critic Model

URL: http://arxiv.org/abs/2006.11431v1
Date: Fri, 19 Jun 2020 22:52:43 GMT
Title: Band-limited Soft Actor Critic Model
Authors: Miguel Campo, Zhengxing Chen, Luke Kung, Kittipat Virochsiri and Jianyu Wang
Abstract summary: Soft Actor Critic (SAC) algorithms show remarkable performance in complex simulated environments. We take this idea one step further by artificially bandlimiting the target critic spatial resolution. We derive the closed form solution in the linear case and show that bandlimiting reduces the interdependency between the low frequency components of the state-action value approximation.
Score: 15.11069042369131
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Soft Actor Critic (SAC) algorithms show remarkable performance in complex simulated environments. A key element of SAC networks is entropy regularization, which prevents the SAC actor from optimizing against fine grained features, oftentimes transient, of the state-action value function. This results in better sample efficiency during early training. We take this idea one step further by artificially bandlimiting the target critic spatial resolution through the addition of a convolutional filter. We derive the closed form solution in the linear case and show that bandlimiting reduces the interdependency between the low and high frequency components of the state-action value approximation, allowing the critic to learn faster. In experiments, the bandlimited SAC outperformed the classic twin-critic SAC in a number of Gym environments, and displayed more stability in returns. We derive novel insights about SAC by adding a stochastic noise disturbance, a technique that is increasingly being used to learn robust policies that transfer well to the real world counterparts.

Related papers

DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty [21.542065840791683]
Deep reinforcement learning (RL) has achieved significant success, yet its application in real-world scenarios is often hindered by a lack of robustness to environmental uncertainties.<n>We propose Distributionally Robust Soft Actor-Critic (DR-SAC), a novel algorithm designed to enhance the robustness of the state-of-the-art Soft Actor-Critic (SAC) algorithm.
arXiv Detail & Related papers (2025-06-14T20:36:44Z)
Soft Actor-Critic with Beta Policy via Implicit Reparameterization Gradients [0.0]
Soft actor-critic (SAC) mitigates poor sample efficiency by combining policy optimization and off-policy learning. It is limited to distributions whose gradients can be computed through the re parameterization trick. We extend this technique to train SAC with the beta policy on simulated robot locomotion environments. Experimental results show that the beta policy is a viable alternative, as it outperforms the normal policy and is on par with the normal policy.
arXiv Detail & Related papers (2024-09-08T04:30:51Z)
RLSAC: Reinforcement Learning enhanced Sample Consensus for End-to-End Robust Estimation [74.47709320443998]
We propose RLSAC, a novel Reinforcement Learning enhanced SAmple Consensus framework for end-to-end robust estimation. RLSAC employs a graph neural network to utilize both data and memory features to guide exploring directions for sampling the next minimum set. Our experimental results demonstrate that RLSAC can learn from features to gradually explore a better hypothesis.
arXiv Detail & Related papers (2023-08-10T03:14:19Z)
PAC-Bayesian Soft Actor-Critic Learning [9.752336113724928]
Actor-critic algorithms address the dual goals of reinforcement learning (RL), policy evaluation and improvement via two separate function approximators. We tackle this bottleneck by employing an existing Probably Approximately Correct (PAC) Bayesian bound for the first time as the critic training objective of the Soft Actor-Critic (SAC) algorithm.
arXiv Detail & Related papers (2023-01-30T10:44:15Z)
Solving Continuous Control via Q-learning [54.05120662838286]
We show that a simple modification of deep Q-learning largely alleviates issues with actor-critic methods. By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods.
arXiv Detail & Related papers (2022-10-22T22:55:50Z)
Revisiting Discrete Soft Actor-Critic [42.88653969438699]
We study the adaption of Soft Actor-Critic (SAC), which is considered as a state-of-the-art reinforcement learning (RL) algorithm. We propose Stable Discrete SAC (SDSAC), an algorithm that leverages entropy-penalty and double average Q-learning with Q-clip to address these issues.
arXiv Detail & Related papers (2022-09-21T03:01:36Z)
Target Entropy Annealing for Discrete Soft Actor-Critic [64.71285903492183]
Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm for continuous action settings. It is counter-intuitive that empirical evidence shows SAC does not perform well in discrete domains. We propose Target Entropy Scheduled SAC (TES-SAC), an annealing method for the target entropy parameter applied on SAC. We compare our method on Atari 2600 games with different constant target entropy SAC, and analyze on how our scheduling affects SAC.
arXiv Detail & Related papers (2021-12-06T08:21:27Z)
Investigating Tradeoffs in Real-World Video Super-Resolution [90.81396836308085]
Real-world video super-resolution (VSR) models are often trained with diverse degradations to improve generalizability. To alleviate the first tradeoff, we propose a degradation scheme that reduces up to 40% of training time without sacrificing performance. To facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences.
arXiv Detail & Related papers (2021-11-24T18:58:21Z)
Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples with On-Policy Experience [9.06635747612495]
Soft Actor-Critic (SAC) is an off-policy actor-critic reinforcement learning algorithm. SAC trains a policy by maximizing the trade-off between expected return and entropy. It has achieved state-of-the-art performance on a range of continuous-control benchmark tasks.
arXiv Detail & Related papers (2021-09-24T06:46:28Z)
TASAC: Temporally Abstract Soft Actor-Critic for Continuous Control [28.534585378574143]
TASAC is an off-policy RL algorithm that incorporates closed-loop temporal abstraction into the soft actor-critic framework. It has two benefits compared to traditional off-policy RL algorithms: persistent exploration and an unbiased multi-step Q operator for TD learning.
arXiv Detail & Related papers (2021-04-13T21:24:44Z)
Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality [131.45028999325797]
We develop a doubly robust off-policy AC (DR-Off-PAC) for discounted MDP. DR-Off-PAC adopts a single timescale structure, in which both actor and critics are updated simultaneously with constant stepsize. We study the finite-time convergence rate and characterize the sample complexity for DR-Off-PAC to attain an $epsilon$-accurate optimal policy.
arXiv Detail & Related papers (2021-02-23T18:56:13Z)
Non-Cooperative Game Theory Based Rate Adaptation for Dynamic Video Streaming over HTTP [89.30855958779425]
Dynamic Adaptive Streaming over HTTP (DASH) has demonstrated to be an emerging and promising multimedia streaming technique. We propose a novel algorithm to optimally allocate the limited export bandwidth of the server to multi-users to maximize their Quality of Experience (QoE) with fairness guaranteed.
arXiv Detail & Related papers (2019-12-27T01:19:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.