Related papers: Actor-Director-Critic: A Novel Deep Reinforcement Learning Framework

Actor-Director-Critic: A Novel Deep Reinforcement Learning Framework

URL: http://arxiv.org/abs/2301.03887v1
Date: Tue, 10 Jan 2023 10:21:32 GMT
Title: Actor-Director-Critic: A Novel Deep Reinforcement Learning Framework
Authors: Zongwei Liu, Yonghong Song, Yuanlin Zhang
Abstract summary: We propose actor-director-critic, a new framework for deep reinforcement learning. For the two critic networks used, we design two target critic networks for each critic network instead of one. In order to verify the performance of the actor-director-critic framework and the improved double estimator method, we applied them to the TD3 algorithm.
Score: 2.6477113498726244
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose actor-director-critic, a new framework for deep reinforcement learning. Compared with the actor-critic framework, the director role is added, and action classification and action evaluation are applied simultaneously to improve the decision-making performance of the agent. Firstly, the actions of the agent are divided into high quality actions and low quality actions according to the rewards returned from the environment. Then, the director network is trained to have the ability to discriminate high and low quality actions and guide the actor network to reduce the repetitive exploration of low quality actions in the early stage of training. In addition, we propose an improved double estimator method to better solve the problem of overestimation in the field of reinforcement learning. For the two critic networks used, we design two target critic networks for each critic network instead of one. In this way, the target value of each critic network can be calculated by taking the average of the outputs of the two target critic networks, which is more stable and accurate than using only one target critic network to obtain the target value. In order to verify the performance of the actor-director-critic framework and the improved double estimator method, we applied them to the TD3 algorithm to improve the TD3 algorithm. Then, we carried out experiments in multiple environments in MuJoCo and compared the experimental data before and after the algorithm improvement. The final experimental results show that the improved algorithm can achieve faster convergence speed and higher total return.

Related papers

Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees [12.259191000019033]
Actor-critic (AC) methods are widely used in reinforcement learning (RL) We design a joint objective for training the actor and critic in a decision-aware fashion. We empirically demonstrate the benefit of our decision-aware actor-critic framework on simple RL problems.
arXiv Detail & Related papers (2023-05-24T15:34:21Z)
PAC-Bayesian Soft Actor-Critic Learning [9.752336113724928]
Actor-critic algorithms address the dual goals of reinforcement learning (RL), policy evaluation and improvement via two separate function approximators. We tackle this bottleneck by employing an existing Probably Approximately Correct (PAC) Bayesian bound for the first time as the critic training objective of the Soft Actor-Critic (SAC) algorithm.
arXiv Detail & Related papers (2023-01-30T10:44:15Z)
Actor Prioritized Experience Replay [0.0]
Prioritized Experience Replay (PER) allows agents to learn from transitions sampled with non-uniform probability proportional to their temporal-difference (TD) error. We introduce a novel experience replay sampling framework for actor-critic methods, which also regards issues with stability and recent findings behind the poor empirical performance of PER. An extensive set of experiments verifies our theoretical claims and demonstrates that the introduced method significantly outperforms the competing approaches.
arXiv Detail & Related papers (2022-09-01T15:27:46Z)
Simultaneous Double Q-learning with Conservative Advantage Learning for Actor-Critic Methods [133.85604983925282]
We propose Simultaneous Double Q-learning with Conservative Advantage Learning (SDQ-CAL) Our algorithm realizes less biased value estimation and achieves state-of-the-art performance in a range of continuous control benchmark tasks.
arXiv Detail & Related papers (2022-05-08T09:17:16Z)
Off-policy Reinforcement Learning with Optimistic Exploration and Distribution Correction [73.77593805292194]
We train a separate exploration policy to maximize an approximate upper confidence bound of the critics in an off-policy actor-critic framework. To mitigate the off-policy-ness, we adapt the recently introduced DICE framework to learn a distribution correction ratio for off-policy actor-critic training.
arXiv Detail & Related papers (2021-10-22T22:07:51Z)
Group-aware Contrastive Regression for Action Quality Assessment [85.43203180953076]
We show that the relations among videos can provide important clues for more accurate action quality assessment. Our approach outperforms previous methods by a large margin and establishes new state-of-the-art on all three benchmarks.
arXiv Detail & Related papers (2021-08-17T17:59:39Z)
Analysis of a Target-Based Actor-Critic Algorithm with Linear Function Approximation [2.1592777170316366]
Actor-critic methods integrating target networks have exhibited a stupendous empirical success in deep reinforcement learning. We bridge this gap by proposing the first theoretical analysis of an online target-based actor-critic with linear function approximation in the discounted reward setting.
arXiv Detail & Related papers (2021-06-14T14:59:05Z)
Efficient Continuous Control with Double Actors and Regularized Critics [7.072664211491016]
We explore the potential of double actors, which has been neglected for a long time, for better value function estimation in continuous setting. We build double actors upon single critic and double critics to handle overestimation bias in DDPG and underestimation bias in TD3 respectively. To mitigate the uncertainty of value estimate from double critics, we propose to regularize the critic networks under double actors architecture.
arXiv Detail & Related papers (2021-06-06T07:04:48Z)
Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs. We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z)
Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy [122.01837436087516]
We study the global convergence and global optimality of actor-critic, one of the most popular families of reinforcement learning algorithms. We establish the rate of convergence and global optimality of single-timescale actor-critic with linear function approximation for the first time.
arXiv Detail & Related papers (2020-08-02T14:01:49Z)
Online Meta-Critic Learning for Off-Policy Actor-Critic Methods [107.98781730288897]
Off-Policy Actor-Critic (Off-PAC) methods have proven successful in a variety of continuous control tasks. We introduce a novel and flexible meta-critic that observes the learning process and meta-learns an additional loss for the actor.
arXiv Detail & Related papers (2020-03-11T14:39:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.