Band-limited Soft Actor Critic Model
- URL: http://arxiv.org/abs/2006.11431v1
- Date: Fri, 19 Jun 2020 22:52:43 GMT
- Title: Band-limited Soft Actor Critic Model
- Authors: Miguel Campo, Zhengxing Chen, Luke Kung, Kittipat Virochsiri and
Jianyu Wang
- Abstract summary: Soft Actor Critic (SAC) algorithms show remarkable performance in complex simulated environments.
We take this idea one step further by artificially bandlimiting the target critic spatial resolution.
We derive the closed form solution in the linear case and show that bandlimiting reduces the interdependency between the low frequency components of the state-action value approximation.
- Score: 15.11069042369131
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Soft Actor Critic (SAC) algorithms show remarkable performance in complex
simulated environments. A key element of SAC networks is entropy
regularization, which prevents the SAC actor from optimizing against fine
grained features, oftentimes transient, of the state-action value function.
This results in better sample efficiency during early training. We take this
idea one step further by artificially bandlimiting the target critic spatial
resolution through the addition of a convolutional filter. We derive the closed
form solution in the linear case and show that bandlimiting reduces the
interdependency between the low and high frequency components of the
state-action value approximation, allowing the critic to learn faster. In
experiments, the bandlimited SAC outperformed the classic twin-critic SAC in a
number of Gym environments, and displayed more stability in returns. We derive
novel insights about SAC by adding a stochastic noise disturbance, a technique
that is increasingly being used to learn robust policies that transfer well to
the real world counterparts.
Related papers
- Soft Actor-Critic with Beta Policy via Implicit Reparameterization Gradients [0.0]
Soft actor-critic (SAC) mitigates poor sample efficiency by combining policy optimization and off-policy learning.
It is limited to distributions whose gradients can be computed through the re parameterization trick.
We extend this technique to train SAC with the beta policy on simulated robot locomotion environments.
Experimental results show that the beta policy is a viable alternative, as it outperforms the normal policy and is on par with the normal policy.
arXiv Detail & Related papers (2024-09-08T04:30:51Z) - RLSAC: Reinforcement Learning enhanced Sample Consensus for End-to-End
Robust Estimation [74.47709320443998]
We propose RLSAC, a novel Reinforcement Learning enhanced SAmple Consensus framework for end-to-end robust estimation.
RLSAC employs a graph neural network to utilize both data and memory features to guide exploring directions for sampling the next minimum set.
Our experimental results demonstrate that RLSAC can learn from features to gradually explore a better hypothesis.
arXiv Detail & Related papers (2023-08-10T03:14:19Z) - PAC-Bayesian Soft Actor-Critic Learning [9.752336113724928]
Actor-critic algorithms address the dual goals of reinforcement learning (RL), policy evaluation and improvement via two separate function approximators.
We tackle this bottleneck by employing an existing Probably Approximately Correct (PAC) Bayesian bound for the first time as the critic training objective of the Soft Actor-Critic (SAC) algorithm.
arXiv Detail & Related papers (2023-01-30T10:44:15Z) - Solving Continuous Control via Q-learning [54.05120662838286]
We show that a simple modification of deep Q-learning largely alleviates issues with actor-critic methods.
By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods.
arXiv Detail & Related papers (2022-10-22T22:55:50Z) - Target Entropy Annealing for Discrete Soft Actor-Critic [64.71285903492183]
Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm for continuous action settings.
It is counter-intuitive that empirical evidence shows SAC does not perform well in discrete domains.
We propose Target Entropy Scheduled SAC (TES-SAC), an annealing method for the target entropy parameter applied on SAC.
We compare our method on Atari 2600 games with different constant target entropy SAC, and analyze on how our scheduling affects SAC.
arXiv Detail & Related papers (2021-12-06T08:21:27Z) - Investigating Tradeoffs in Real-World Video Super-Resolution [90.81396836308085]
Real-world video super-resolution (VSR) models are often trained with diverse degradations to improve generalizability.
To alleviate the first tradeoff, we propose a degradation scheme that reduces up to 40% of training time without sacrificing performance.
To facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences.
arXiv Detail & Related papers (2021-11-24T18:58:21Z) - Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples with
On-Policy Experience [9.06635747612495]
Soft Actor-Critic (SAC) is an off-policy actor-critic reinforcement learning algorithm.
SAC trains a policy by maximizing the trade-off between expected return and entropy.
It has achieved state-of-the-art performance on a range of continuous-control benchmark tasks.
arXiv Detail & Related papers (2021-09-24T06:46:28Z) - TASAC: Temporally Abstract Soft Actor-Critic for Continuous Control [28.534585378574143]
TASAC is an off-policy RL algorithm that incorporates closed-loop temporal abstraction into the soft actor-critic framework.
It has two benefits compared to traditional off-policy RL algorithms: persistent exploration and an unbiased multi-step Q operator for TD learning.
arXiv Detail & Related papers (2021-04-13T21:24:44Z) - Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality [131.45028999325797]
We develop a doubly robust off-policy AC (DR-Off-PAC) for discounted MDP.
DR-Off-PAC adopts a single timescale structure, in which both actor and critics are updated simultaneously with constant stepsize.
We study the finite-time convergence rate and characterize the sample complexity for DR-Off-PAC to attain an $epsilon$-accurate optimal policy.
arXiv Detail & Related papers (2021-02-23T18:56:13Z) - OPAC: Opportunistic Actor-Critic [0.0]
Opportunistic Actor-Critic (OPAC) is a novel model-free deep RL algorithm that employs better exploration policy and lesser variance.
OPAC combines some of the most powerful features of TD3 and SAC and aims to optimize a policy in an off-policy way.
arXiv Detail & Related papers (2020-12-11T18:33:35Z) - Non-Cooperative Game Theory Based Rate Adaptation for Dynamic Video
Streaming over HTTP [89.30855958779425]
Dynamic Adaptive Streaming over HTTP (DASH) has demonstrated to be an emerging and promising multimedia streaming technique.
We propose a novel algorithm to optimally allocate the limited export bandwidth of the server to multi-users to maximize their Quality of Experience (QoE) with fairness guaranteed.
arXiv Detail & Related papers (2019-12-27T01:19:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.