Actor-Director-Critic: A Novel Deep Reinforcement Learning Framework
- URL: http://arxiv.org/abs/2301.03887v1
- Date: Tue, 10 Jan 2023 10:21:32 GMT
- Title: Actor-Director-Critic: A Novel Deep Reinforcement Learning Framework
- Authors: Zongwei Liu, Yonghong Song, Yuanlin Zhang
- Abstract summary: We propose actor-director-critic, a new framework for deep reinforcement learning.
For the two critic networks used, we design two target critic networks for each critic network instead of one.
In order to verify the performance of the actor-director-critic framework and the improved double estimator method, we applied them to the TD3 algorithm.
- Score: 2.6477113498726244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose actor-director-critic, a new framework for deep
reinforcement learning. Compared with the actor-critic framework, the director
role is added, and action classification and action evaluation are applied
simultaneously to improve the decision-making performance of the agent.
Firstly, the actions of the agent are divided into high quality actions and low
quality actions according to the rewards returned from the environment. Then,
the director network is trained to have the ability to discriminate high and
low quality actions and guide the actor network to reduce the repetitive
exploration of low quality actions in the early stage of training. In addition,
we propose an improved double estimator method to better solve the problem of
overestimation in the field of reinforcement learning. For the two critic
networks used, we design two target critic networks for each critic network
instead of one. In this way, the target value of each critic network can be
calculated by taking the average of the outputs of the two target critic
networks, which is more stable and accurate than using only one target critic
network to obtain the target value. In order to verify the performance of the
actor-director-critic framework and the improved double estimator method, we
applied them to the TD3 algorithm to improve the TD3 algorithm. Then, we
carried out experiments in multiple environments in MuJoCo and compared the
experimental data before and after the algorithm improvement. The final
experimental results show that the improved algorithm can achieve faster
convergence speed and higher total return.
Related papers
- Decision-Aware Actor-Critic with Function Approximation and Theoretical
Guarantees [12.259191000019033]
Actor-critic (AC) methods are widely used in reinforcement learning (RL)
We design a joint objective for training the actor and critic in a decision-aware fashion.
We empirically demonstrate the benefit of our decision-aware actor-critic framework on simple RL problems.
arXiv Detail & Related papers (2023-05-24T15:34:21Z) - PAC-Bayesian Soft Actor-Critic Learning [9.752336113724928]
Actor-critic algorithms address the dual goals of reinforcement learning (RL), policy evaluation and improvement via two separate function approximators.
We tackle this bottleneck by employing an existing Probably Approximately Correct (PAC) Bayesian bound for the first time as the critic training objective of the Soft Actor-Critic (SAC) algorithm.
arXiv Detail & Related papers (2023-01-30T10:44:15Z) - Actor Prioritized Experience Replay [0.0]
Prioritized Experience Replay (PER) allows agents to learn from transitions sampled with non-uniform probability proportional to their temporal-difference (TD) error.
We introduce a novel experience replay sampling framework for actor-critic methods, which also regards issues with stability and recent findings behind the poor empirical performance of PER.
An extensive set of experiments verifies our theoretical claims and demonstrates that the introduced method significantly outperforms the competing approaches.
arXiv Detail & Related papers (2022-09-01T15:27:46Z) - Simultaneous Double Q-learning with Conservative Advantage Learning for
Actor-Critic Methods [133.85604983925282]
We propose Simultaneous Double Q-learning with Conservative Advantage Learning (SDQ-CAL)
Our algorithm realizes less biased value estimation and achieves state-of-the-art performance in a range of continuous control benchmark tasks.
arXiv Detail & Related papers (2022-05-08T09:17:16Z) - Off-policy Reinforcement Learning with Optimistic Exploration and
Distribution Correction [73.77593805292194]
We train a separate exploration policy to maximize an approximate upper confidence bound of the critics in an off-policy actor-critic framework.
To mitigate the off-policy-ness, we adapt the recently introduced DICE framework to learn a distribution correction ratio for off-policy actor-critic training.
arXiv Detail & Related papers (2021-10-22T22:07:51Z) - Group-aware Contrastive Regression for Action Quality Assessment [85.43203180953076]
We show that the relations among videos can provide important clues for more accurate action quality assessment.
Our approach outperforms previous methods by a large margin and establishes new state-of-the-art on all three benchmarks.
arXiv Detail & Related papers (2021-08-17T17:59:39Z) - Analysis of a Target-Based Actor-Critic Algorithm with Linear Function
Approximation [2.1592777170316366]
Actor-critic methods integrating target networks have exhibited a stupendous empirical success in deep reinforcement learning.
We bridge this gap by proposing the first theoretical analysis of an online target-based actor-critic with linear function approximation in the discounted reward setting.
arXiv Detail & Related papers (2021-06-14T14:59:05Z) - Efficient Continuous Control with Double Actors and Regularized Critics [7.072664211491016]
We explore the potential of double actors, which has been neglected for a long time, for better value function estimation in continuous setting.
We build double actors upon single critic and double critics to handle overestimation bias in DDPG and underestimation bias in TD3 respectively.
To mitigate the uncertainty of value estimate from double critics, we propose to regularize the critic networks under double actors architecture.
arXiv Detail & Related papers (2021-06-06T07:04:48Z) - Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs.
We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z) - Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy [122.01837436087516]
We study the global convergence and global optimality of actor-critic, one of the most popular families of reinforcement learning algorithms.
We establish the rate of convergence and global optimality of single-timescale actor-critic with linear function approximation for the first time.
arXiv Detail & Related papers (2020-08-02T14:01:49Z) - Online Meta-Critic Learning for Off-Policy Actor-Critic Methods [107.98781730288897]
Off-Policy Actor-Critic (Off-PAC) methods have proven successful in a variety of continuous control tasks.
We introduce a novel and flexible meta-critic that observes the learning process and meta-learns an additional loss for the actor.
arXiv Detail & Related papers (2020-03-11T14:39:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.