Randomized Ensembled Double Q-Learning: Learning Fast Without a Model
- URL: http://arxiv.org/abs/2101.05982v2
- Date: Thu, 18 Mar 2021 03:42:07 GMT
- Title: Randomized Ensembled Double Q-Learning: Learning Fast Without a Model
- Authors: Xinyue Chen, Che Wang, Zijian Zhou, Keith Ross
- Abstract summary: We introduce a simple model-free algorithm, Randomized Ensembled Double Q-Learning (REDQ)
We show that REDQ's performance is just as good as, if not better than, a state-of-the-art model-based algorithm for the MuJoCo benchmark.
- Score: 8.04816643418952
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Using a high Update-To-Data (UTD) ratio, model-based methods have recently
achieved much higher sample efficiency than previous model-free methods for
continuous-action DRL benchmarks. In this paper, we introduce a simple
model-free algorithm, Randomized Ensembled Double Q-Learning (REDQ), and show
that its performance is just as good as, if not better than, a state-of-the-art
model-based algorithm for the MuJoCo benchmark. Moreover, REDQ can achieve this
performance using fewer parameters than the model-based method, and with less
wall-clock run time. REDQ has three carefully integrated ingredients which
allow it to achieve its high performance: (i) a UTD ratio >> 1; (ii) an
ensemble of Q functions; (iii) in-target minimization across a random subset of
Q functions from the ensemble. Through carefully designed experiments, we
provide a detailed analysis of REDQ and related model-free algorithms. To our
knowledge, REDQ is the first successful model-free DRL algorithm for
continuous-action spaces using a UTD ratio >> 1.
Related papers
- Supervised Score-Based Modeling by Gradient Boosting [49.556736252628745]
We propose a Supervised Score-based Model (SSM) which can be viewed as a gradient boosting algorithm combining score matching.
We provide a theoretical analysis of learning and sampling for SSM to balance inference time and prediction accuracy.
Our model outperforms existing models in both accuracy and inference time.
arXiv Detail & Related papers (2024-11-02T07:06:53Z) - COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks.
We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges.
Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z) - Variance-Reduced Cascade Q-learning: Algorithms and Sample Complexity [3.4376560669160394]
We introduce and analyze a novel model-free algorithm called Variance-Reduced Cascade Q-learning (VRCQ)
VRCQ provides superior guarantees in the $ell_infty$-norm compared with the existing model-free approximation-type algorithms.
arXiv Detail & Related papers (2024-08-13T00:34:33Z) - Smart Sampling: Self-Attention and Bootstrapping for Improved Ensembled Q-Learning [0.6963971634605796]
We present a novel method aimed at enhancing the sample efficiency of ensemble Q learning.
Our proposed approach integrates multi-head self-attention into the ensembled Q networks while bootstrapping the state-action pairs ingested by the ensemble.
arXiv Detail & Related papers (2024-05-14T00:57:02Z) - Exploiting the Potential of Seq2Seq Models as Robust Few-Shot Learners [8.43854206194162]
We show that seq2seq models can be highly effective few-shot learners for a wide spectrum of applications.
We propose two methods to more effectively elicit in-context learning ability in seq2seq models: objective-aligned prompting and a fusion-based approach.
arXiv Detail & Related papers (2023-07-27T13:37:06Z) - Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution.
We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well.
Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z) - Boosting Low-Data Instance Segmentation by Unsupervised Pre-training
with Saliency Prompt [103.58323875748427]
This work offers a novel unsupervised pre-training solution for low-data regimes.
Inspired by the recent success of the Prompting technique, we introduce a new pre-training method that boosts QEIS models.
Experimental results show that our method significantly boosts several QEIS models on three datasets.
arXiv Detail & Related papers (2023-02-02T15:49:03Z) - Simultaneous Double Q-learning with Conservative Advantage Learning for
Actor-Critic Methods [133.85604983925282]
We propose Simultaneous Double Q-learning with Conservative Advantage Learning (SDQ-CAL)
Our algorithm realizes less biased value estimation and achieves state-of-the-art performance in a range of continuous control benchmark tasks.
arXiv Detail & Related papers (2022-05-08T09:17:16Z) - Aggressive Q-Learning with Ensembles: Achieving Both High Sample
Efficiency and High Asymptotic Performance [12.871109549160389]
We propose a novel model-free algorithm, Aggressive Q-Learning with Ensembles (AQE), which improves the sample-efficiency performance of REDQ and the performance of TQC.
AQE is very simple, requiring neither distributional representation of critics nor target randomization.
arXiv Detail & Related papers (2021-11-17T14:48:52Z) - Dropout Q-Functions for Doubly Efficient Reinforcement Learning [12.267045729018653]
We propose a method of improving computational efficiency called Dr.Q.
Dr.Q is a variant of REDQ that uses a small ensemble of dropout Q-functions.
It achieved comparable sample efficiency with REDQ and much better computational efficiency than REDQ and comparable computational efficiency with that of SAC.
arXiv Detail & Related papers (2021-10-05T13:28:11Z) - When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable.
In order to achieve a better accuracy, we propose two lightweight modules.
DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers.
QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.