Aggressive Q-Learning with Ensembles: Achieving Both High Sample
Efficiency and High Asymptotic Performance
- URL: http://arxiv.org/abs/2111.09159v1
- Date: Wed, 17 Nov 2021 14:48:52 GMT
- Title: Aggressive Q-Learning with Ensembles: Achieving Both High Sample
Efficiency and High Asymptotic Performance
- Authors: Yanqiu Wu, Xinyue Chen, Che Wang, Yiming Zhang, Zijian Zhou, Keith W.
Ross
- Abstract summary: We propose a novel model-free algorithm, Aggressive Q-Learning with Ensembles (AQE), which improves the sample-efficiency performance of REDQ and the performance of TQC.
AQE is very simple, requiring neither distributional representation of critics nor target randomization.
- Score: 12.871109549160389
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Truncated Quantile Critics (TQC), using distributional
representation of critics, was shown to provide state-of-the-art asymptotic
training performance on all environments from the MuJoCo continuous control
benchmark suite. Also recently, Randomized Ensemble Double Q-Learning (REDQ),
using a high update-to-data ratio and target randomization, was shown to
achieve high sample efficiency that is competitive with state-of-the-art
model-based methods. In this paper, we propose a novel model-free algorithm,
Aggressive Q-Learning with Ensembles (AQE), which improves the
sample-efficiency performance of REDQ and the asymptotic performance of TQC,
thereby providing overall state-of-the-art performance during all stages of
training. Moreover, AQE is very simple, requiring neither distributional
representation of critics nor target randomization.
Related papers
- Boosting CLIP Adaptation for Image Quality Assessment via Meta-Prompt Learning and Gradient Regularization [55.09893295671917]
This paper introduces a novel Gradient-Regulated Meta-Prompt IQA Framework (GRMP-IQA)
The GRMP-IQA comprises two key modules: Meta-Prompt Pre-training Module and Quality-Aware Gradient Regularization.
Experiments on five standard BIQA datasets demonstrate the superior performance to the state-of-the-art BIQA methods under limited data setting.
arXiv Detail & Related papers (2024-09-09T07:26:21Z) - Smart Sampling: Self-Attention and Bootstrapping for Improved Ensembled Q-Learning [0.6963971634605796]
We present a novel method aimed at enhancing the sample efficiency of ensemble Q learning.
Our proposed approach integrates multi-head self-attention into the ensembled Q networks while bootstrapping the state-action pairs ingested by the ensemble.
arXiv Detail & Related papers (2024-05-14T00:57:02Z) - Push Quantization-Aware Training Toward Full Precision Performances via
Consistency Regularization [23.085230108628707]
Quantization-Aware Training (QAT) methods intensively depend on the complete labeled dataset or knowledge distillation to guarantee the performances toward Full Precision (FP) accuracies.
We present a simple, novel, yet powerful method introducing an Consistency Regularization (CR) for QAT.
Our method generalizes well to different network architectures and various QAT methods.
arXiv Detail & Related papers (2024-02-21T03:19:48Z) - Pointer Networks with Q-Learning for Combinatorial Optimization [55.2480439325792]
We introduce the Pointer Q-Network (PQN), a hybrid neural architecture that integrates model-free Q-value policy approximation with Pointer Networks (Ptr-Nets)
Our empirical results demonstrate the efficacy of this approach, also testing the model in unstable environments.
arXiv Detail & Related papers (2023-11-05T12:03:58Z) - Improving Offline-to-Online Reinforcement Learning with Q Conditioned State Entropy Exploration [29.891468119032]
We study how to fine-tune offline reinforcement learning (RL) pre-trained policy.
We propose Q conditioned state entropy (QCSE) as intrinsic reward.
We observe significant improvements with QCSE (about 13% for CQL and 8% for Cal-QL)
arXiv Detail & Related papers (2023-10-07T00:02:05Z) - Simultaneous Double Q-learning with Conservative Advantage Learning for
Actor-Critic Methods [133.85604983925282]
We propose Simultaneous Double Q-learning with Conservative Advantage Learning (SDQ-CAL)
Our algorithm realizes less biased value estimation and achieves state-of-the-art performance in a range of continuous control benchmark tasks.
arXiv Detail & Related papers (2022-05-08T09:17:16Z) - Task-Specific Normalization for Continual Learning of Blind Image
Quality Models [105.03239956378465]
We present a simple yet effective continual learning method for blind image quality assessment (BIQA)
The key step in our approach is to freeze all convolution filters of a pre-trained deep neural network (DNN) for an explicit promise of stability.
We assign each new IQA dataset (i.e., task) a prediction head, and load the corresponding normalization parameters to produce a quality score.
The final quality estimate is computed by black a weighted summation of predictions from all heads with a lightweight $K$-means gating mechanism.
arXiv Detail & Related papers (2021-07-28T15:21:01Z) - Randomized Ensembled Double Q-Learning: Learning Fast Without a Model [8.04816643418952]
We introduce a simple model-free algorithm, Randomized Ensembled Double Q-Learning (REDQ)
We show that REDQ's performance is just as good as, if not better than, a state-of-the-art model-based algorithm for the MuJoCo benchmark.
arXiv Detail & Related papers (2021-01-15T06:25:58Z) - Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods.
Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z) - Feature Quantization Improves GAN Training [126.02828112121874]
Feature Quantization (FQ) for the discriminator embeds both true and fake data samples into a shared discrete space.
Our method can be easily plugged into existing GAN models, with little computational overhead in training.
arXiv Detail & Related papers (2020-04-05T04:06:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.