Dropout Q-Functions for Doubly Efficient Reinforcement Learning
- URL: http://arxiv.org/abs/2110.02034v1
- Date: Tue, 5 Oct 2021 13:28:11 GMT
- Title: Dropout Q-Functions for Doubly Efficient Reinforcement Learning
- Authors: Takuya Hiraoka, Takahisa Imagawa, Taisei Hashimoto, Takashi Onishi,
Yoshimasa Tsuruoka
- Abstract summary: We propose a method of improving computational efficiency called Dr.Q.
Dr.Q is a variant of REDQ that uses a small ensemble of dropout Q-functions.
It achieved comparable sample efficiency with REDQ and much better computational efficiency than REDQ and comparable computational efficiency with that of SAC.
- Score: 12.267045729018653
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Randomized ensemble double Q-learning (REDQ) has recently achieved
state-of-the-art sample efficiency on continuous-action reinforcement learning
benchmarks. This superior sample efficiency is possible by using a large
Q-function ensemble. However, REDQ is much less computationally efficient than
non-ensemble counterparts such as Soft Actor-Critic (SAC). To make REDQ more
computationally efficient, we propose a method of improving computational
efficiency called Dr.Q, which is a variant of REDQ that uses a small ensemble
of dropout Q-functions. Our dropout Q-functions are simple Q-functions equipped
with dropout connection and layer normalization. Despite its simplicity of
implementation, our experimental results indicate that Dr.Q is doubly (sample
and computationally) efficient. It achieved comparable sample efficiency with
REDQ and much better computational efficiency than REDQ and comparable
computational efficiency with that of SAC.
Related papers
- Smart Sampling: Self-Attention and Bootstrapping for Improved Ensembled Q-Learning [0.6963971634605796]
We present a novel method aimed at enhancing the sample efficiency of ensemble Q learning.
Our proposed approach integrates multi-head self-attention into the ensembled Q networks while bootstrapping the state-action pairs ingested by the ensemble.
arXiv Detail & Related papers (2024-05-14T00:57:02Z) - Pointer Networks with Q-Learning for Combinatorial Optimization [55.2480439325792]
We introduce the Pointer Q-Network (PQN), a hybrid neural architecture that integrates model-free Q-value policy approximation with Pointer Networks (Ptr-Nets)
Our empirical results demonstrate the efficacy of this approach, also testing the model in unstable environments.
arXiv Detail & Related papers (2023-11-05T12:03:58Z) - Simultaneous Double Q-learning with Conservative Advantage Learning for
Actor-Critic Methods [133.85604983925282]
We propose Simultaneous Double Q-learning with Conservative Advantage Learning (SDQ-CAL)
Our algorithm realizes less biased value estimation and achieves state-of-the-art performance in a range of continuous control benchmark tasks.
arXiv Detail & Related papers (2022-05-08T09:17:16Z) - Quantum circuit architecture search on a superconducting processor [56.04169357427682]
Variational quantum algorithms (VQAs) have shown strong evidences to gain provable computational advantages for diverse fields such as finance, machine learning, and chemistry.
However, the ansatz exploited in modern VQAs is incapable of balancing the tradeoff between expressivity and trainability.
We demonstrate the first proof-of-principle experiment of applying an efficient automatic ansatz design technique to enhance VQAs on an 8-qubit superconducting quantum processor.
arXiv Detail & Related papers (2022-01-04T01:53:42Z) - Aggressive Q-Learning with Ensembles: Achieving Both High Sample
Efficiency and High Asymptotic Performance [12.871109549160389]
We propose a novel model-free algorithm, Aggressive Q-Learning with Ensembles (AQE), which improves the sample-efficiency performance of REDQ and the performance of TQC.
AQE is very simple, requiring neither distributional representation of critics nor target randomization.
arXiv Detail & Related papers (2021-11-17T14:48:52Z) - Online Target Q-learning with Reverse Experience Replay: Efficiently
finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results.
We present novel methods Q-Rex and Q-RexDaRe.
We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z) - Randomized Ensembled Double Q-Learning: Learning Fast Without a Model [8.04816643418952]
We introduce a simple model-free algorithm, Randomized Ensembled Double Q-Learning (REDQ)
We show that REDQ's performance is just as good as, if not better than, a state-of-the-art model-based algorithm for the MuJoCo benchmark.
arXiv Detail & Related papers (2021-01-15T06:25:58Z) - Self-correcting Q-Learning [14.178899938667161]
We introduce a new way to address the bias in the form of a "self-correcting algorithm"
Applying this strategy to Q-learning results in Self-correcting Q-learning.
We show theoretically that this new algorithm enjoys the same convergence guarantees as Q-learning while being more accurate.
arXiv Detail & Related papers (2020-12-02T11:36:24Z) - Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods.
Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z) - AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation.
Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z) - CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity [34.36803740112609]
CrossQ matches or surpasses current state-of-the-art methods in terms of sample efficiency.
It substantially reduces the computational cost compared to REDQ and DroQ.
It is easy to implement, requiring just a few lines of code on top of SAC.
arXiv Detail & Related papers (2019-02-14T21:05:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.