Simultaneous Double Q-learning with Conservative Advantage Learning for
Actor-Critic Methods
- URL: http://arxiv.org/abs/2205.03819v1
- Date: Sun, 8 May 2022 09:17:16 GMT
- Title: Simultaneous Double Q-learning with Conservative Advantage Learning for
Actor-Critic Methods
- Authors: Qing Li, Wengang Zhou, Zhenbo Lu, Houqiang Li
- Abstract summary: We propose Simultaneous Double Q-learning with Conservative Advantage Learning (SDQ-CAL)
Our algorithm realizes less biased value estimation and achieves state-of-the-art performance in a range of continuous control benchmark tasks.
- Score: 133.85604983925282
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Actor-critic Reinforcement Learning (RL) algorithms have achieved impressive
performance in continuous control tasks. However, they still suffer two
nontrivial obstacles, i.e., low sample efficiency and overestimation bias. To
this end, we propose Simultaneous Double Q-learning with Conservative Advantage
Learning (SDQ-CAL). Our SDQ-CAL boosts the Double Q-learning for off-policy
actor-critic RL based on a modification of the Bellman optimality operator with
Advantage Learning. Specifically, SDQ-CAL improves sample efficiency by
modifying the reward to facilitate the distinction from experience between the
optimal actions and the others. Besides, it mitigates the overestimation issue
by updating a pair of critics simultaneously upon double estimators. Extensive
experiments reveal that our algorithm realizes less biased value estimation and
achieves state-of-the-art performance in a range of continuous control
benchmark tasks. We release the source code of our method at:
\url{https://github.com/LQNew/SDQ-CAL}.
Related papers
- Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation [37.36913210031282]
Preference-based reinforcement learning (PbRL) has shown impressive capabilities in training agents without reward engineering.
We propose SEER, an efficient PbRL method that integrates label smoothing and policy regularization techniques.
arXiv Detail & Related papers (2024-05-29T01:49:20Z) - PAC-Bayesian Soft Actor-Critic Learning [9.752336113724928]
Actor-critic algorithms address the dual goals of reinforcement learning (RL), policy evaluation and improvement via two separate function approximators.
We tackle this bottleneck by employing an existing Probably Approximately Correct (PAC) Bayesian bound for the first time as the critic training objective of the Soft Actor-Critic (SAC) algorithm.
arXiv Detail & Related papers (2023-01-30T10:44:15Z) - Planning for Sample Efficient Imitation Learning [52.44953015011569]
Current imitation algorithms struggle to achieve high performance and high in-environment sample efficiency simultaneously.
We propose EfficientImitate, a planning-based imitation learning method that can achieve high in-environment sample efficiency and performance simultaneously.
Experimental results show that EI achieves state-of-the-art results in performance and sample efficiency.
arXiv Detail & Related papers (2022-10-18T05:19:26Z) - Supervised Advantage Actor-Critic for Recommender Systems [76.7066594130961]
We propose negative sampling strategy for training the RL component and combine it with supervised sequential learning.
Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case.
We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets.
arXiv Detail & Related papers (2021-11-05T12:51:15Z) - Online Target Q-learning with Reverse Experience Replay: Efficiently
finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results.
We present novel methods Q-Rex and Q-RexDaRe.
We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z) - On the Estimation Bias in Double Q-Learning [20.856485777692594]
Double Q-learning is not fully unbiased and suffers from underestimation bias.
We show that such underestimation bias may lead to multiple non-optimal fixed points under an approximated Bellman operator.
We propose a simple but effective approach as a partial fix for the underestimation bias in double Q-learning.
arXiv Detail & Related papers (2021-09-29T13:41:24Z) - Ensemble Bootstrapping for Q-Learning [15.07549655582389]
We introduce a new bias-reduced algorithm called Ensemble Bootstrapped Q-Learning (EBQL)
EBQL-like updates yield lower MSE when estimating the maximal mean of a set of independent random variables.
We show that there exist domains where both over and under-estimation result in sub-optimal performance.
arXiv Detail & Related papers (2021-02-28T10:19:47Z) - Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods.
Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z) - Decorrelated Double Q-learning [4.982806898121435]
We introduce the decorrelated double Q-learning (D2Q) to reduce the correlation between value function approximators.
The experimental results on a suite of MuJoCo continuous control tasks demonstrate that our decorrelated double Q-learning can effectively improve the performance.
arXiv Detail & Related papers (2020-06-12T05:59:05Z) - DisCor: Corrective Feedback in Reinforcement Learning via Distribution
Correction [96.90215318875859]
We show that bootstrapping-based Q-learning algorithms do not necessarily benefit from corrective feedback.
We propose a new algorithm, DisCor, which computes an approximation to this optimal distribution and uses it to re-weight the transitions used for training.
arXiv Detail & Related papers (2020-03-16T16:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.