Recursive Least Squares Advantage Actor-Critic Algorithms
- URL: http://arxiv.org/abs/2201.05918v1
- Date: Sat, 15 Jan 2022 20:00:26 GMT
- Title: Recursive Least Squares Advantage Actor-Critic Algorithms
- Authors: Yuan Wang, Chunyuan Zhang, Tianzong Yu, Meng Ma
- Abstract summary: We propose two novel RLS-based advantage actor critic (A2C) algorithms.
RLSSA2C and RLSNA2C, use the RLS method to train the critic network and the hidden layers of the actor network.
From the experimental results, it is shown that our both algorithms have better sample efficiency than the vanilla A2C on most games or tasks.
- Score: 20.792917267835247
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As an important algorithm in deep reinforcement learning, advantage actor
critic (A2C) has been widely succeeded in both discrete and continuous control
tasks with raw pixel inputs, but its sample efficiency still needs to improve
more. In traditional reinforcement learning, actor-critic algorithms generally
use the recursive least squares (RLS) technology to update the parameter of
linear function approximators for accelerating their convergence speed.
However, A2C algorithms seldom use this technology to train deep neural
networks (DNNs) for improving their sample efficiency. In this paper, we
propose two novel RLS-based A2C algorithms and investigate their performance.
Both proposed algorithms, called RLSSA2C and RLSNA2C, use the RLS method to
train the critic network and the hidden layers of the actor network. The main
difference between them is at the policy learning step. RLSSA2C uses an
ordinary first-order gradient descent algorithm and the standard policy
gradient to learn the policy parameter. RLSNA2C uses the Kronecker-factored
approximation, the RLS method and the natural policy gradient to learn the
compatible parameter and the policy parameter. In addition, we analyze the
complexity and convergence of both algorithms, and present three tricks for
further improving their convergence speed. Finally, we demonstrate the
effectiveness of both algorithms on 40 games in the Atari 2600 environment and
11 tasks in the MuJoCo environment. From the experimental results, it is shown
that our both algorithms have better sample efficiency than the vanilla A2C on
most games or tasks, and have higher computational efficiency than other two
state-of-the-art algorithms.
Related papers
- CACTO-SL: Using Sobolev Learning to improve Continuous Actor-Critic with
Trajectory Optimization [12.115023915042617]
Trabo learning guide TO and Reinforcement Learning (RL) are powerful tools to solve optimal control problems.
In this work, we present an extension of CACTO exploiting the idea of Solev-SL.
arXiv Detail & Related papers (2023-12-17T09:44:41Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Revisiting Recursive Least Squares for Training Deep Neural Networks [10.44340837533087]
Recursive least squares (RLS) algorithms were once widely used for training small-scale neural networks, due to their fast convergence.
Previous RLS algorithms are unsuitable for training deep neural networks (DNNs), since they have high computational complexity and too many preconditions.
We propose three novel RLS optimization algorithms for training feedforward neural networks, convolutional neural networks and recurrent neural networks.
arXiv Detail & Related papers (2021-09-07T17:43:51Z) - Provably Faster Algorithms for Bilevel Optimization [54.83583213812667]
Bilevel optimization has been widely applied in many important machine learning applications.
We propose two new algorithms for bilevel optimization.
We show that both algorithms achieve the complexity of $mathcalO(epsilon-1.5)$, which outperforms all existing algorithms by the order of magnitude.
arXiv Detail & Related papers (2021-06-08T21:05:30Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms.
The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z) - Average-Reward Off-Policy Policy Evaluation with Function Approximation [66.67075551933438]
We consider off-policy policy evaluation with function approximation in average-reward MDPs.
bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad.
We propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting.
arXiv Detail & Related papers (2021-01-08T00:43:04Z) - Smoothed functional-based gradient algorithms for off-policy reinforcement learning: A non-asymptotic viewpoint [8.087699764574788]
We propose two policy gradient algorithms for solving the problem of control in an off-policy reinforcement learning context.
Both algorithms incorporate a smoothed functional (SF) based gradient estimation scheme.
arXiv Detail & Related papers (2021-01-06T17:06:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.