Related papers: Randomized Ensembled Double Q-Learning: Learning Fast Without a Model

Randomized Ensembled Double Q-Learning: Learning Fast Without a Model

URL: http://arxiv.org/abs/2101.05982v2
Date: Thu, 18 Mar 2021 03:42:07 GMT
Title: Randomized Ensembled Double Q-Learning: Learning Fast Without a Model
Authors: Xinyue Chen, Che Wang, Zijian Zhou, Keith Ross
Abstract summary: We introduce a simple model-free algorithm, Randomized Ensembled Double Q-Learning (REDQ) We show that REDQ's performance is just as good as, if not better than, a state-of-the-art model-based algorithm for the MuJoCo benchmark.
Score: 8.04816643418952
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Using a high Update-To-Data (UTD) ratio, model-based methods have recently achieved much higher sample efficiency than previous model-free methods for continuous-action DRL benchmarks. In this paper, we introduce a simple model-free algorithm, Randomized Ensembled Double Q-Learning (REDQ), and show that its performance is just as good as, if not better than, a state-of-the-art model-based algorithm for the MuJoCo benchmark. Moreover, REDQ can achieve this performance using fewer parameters than the model-based method, and with less wall-clock run time. REDQ has three carefully integrated ingredients which allow it to achieve its high performance: (i) a UTD ratio >> 1; (ii) an ensemble of Q functions; (iii) in-target minimization across a random subset of Q functions from the ensemble. Through carefully designed experiments, we provide a detailed analysis of REDQ and related model-free algorithms. To our knowledge, REDQ is the first successful model-free DRL algorithm for continuous-action spaces using a UTD ratio >> 1.

Related papers

A2R: An Asymmetric Two-Stage Reasoning Framework for Parallel Reasoning [57.727084580884075]
Asymmetric Two-Stage Reasoning framework designed to bridge gap between a model's potential and its actual performance.<n>A2R-Efficient is a "small-to-big" variant that combines a Qwen3-4B explorer with a Qwen3-8B synthesizer.<n>Results show A2R is not only a performance-boosting framework but also an efficient and practical solution for real-world applications.
arXiv Detail & Related papers (2025-09-26T08:27:03Z)
Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach [65.6966065843227]
Iterative Reweight-then-IRO is a framework that performs RL-style alignment of a frozen base model without touching its parameters.<n>At test time, the value functions are used to guide the base model generation via a search-based optimization process.<n> Notably, users can apply IRO to align a model on their own dataset, similar to OpenAI's reinforcement fine-tuning (RFT)
arXiv Detail & Related papers (2025-06-21T21:49:02Z)
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models [55.2480439325792]
We introduce QAlign, a new test-time alignment approach. As we scale test-time compute, QAlign converges to sampling from the optimal aligned distribution for each individual prompt. By adopting recent advances in Markov chain Monte Carlo for text generation, our method enables better-aligned outputs without modifying the underlying model or even requiring logit access.
arXiv Detail & Related papers (2025-04-04T00:41:40Z)
Reinforced Model Merging [53.84354455400038]
We present an innovative framework termed Reinforced Model Merging (RMM), which encompasses an environment and agent tailored for merging tasks. By utilizing data subsets during the evaluation process, we addressed the bottleneck in the reward feedback phase, thereby accelerating RMM by up to 100 times.
arXiv Detail & Related papers (2025-03-27T08:52:41Z)
RSQ: Learning from Important Tokens Leads to Better Quantized LLMs [65.5558181902098]
Layer-wise quantization is a key technique for efficiently compressing large models without expensive retraining. We propose RSQ (Rotate, Scale, then Quantize), which applies rotations to the model to mitigate outliers. We demonstrate that RSQ consistently outperforms baseline methods across multiple downstream tasks and three model families.
arXiv Detail & Related papers (2025-03-03T18:46:33Z)
Uncertainty Quantification for LLMs through Minimum Bayes Risk: Bridging Confidence and Consistency [66.96286531087549]
Uncertainty quantification (UQ) methods for Large Language Models (LLMs) encompass a variety of approaches.<n>We propose a novel approach to integrating model confidence with output consistency, resulting in a family of efficient and robust UQ methods.<n>We evaluate our approach across various tasks such as question answering, abstractive summarization, and machine translation.
arXiv Detail & Related papers (2025-02-07T14:30:12Z)
Supervised Score-Based Modeling by Gradient Boosting [49.556736252628745]
We propose a Supervised Score-based Model (SSM) which can be viewed as a gradient boosting algorithm combining score matching. We provide a theoretical analysis of learning and sampling for SSM to balance inference time and prediction accuracy. Our model outperforms existing models in both accuracy and inference time.
arXiv Detail & Related papers (2024-11-02T07:06:53Z)
COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks. We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges. Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z)
Variance-Reduced Cascade Q-learning: Algorithms and Sample Complexity [3.4376560669160394]
We introduce and analyze a novel model-free algorithm called Variance-Reduced Cascade Q-learning (VRCQ) VRCQ provides superior guarantees in the $ell_infty$-norm compared with the existing model-free approximation-type algorithms.
arXiv Detail & Related papers (2024-08-13T00:34:33Z)
Smart Sampling: Self-Attention and Bootstrapping for Improved Ensembled Q-Learning [0.6963971634605796]
We present a novel method aimed at enhancing the sample efficiency of ensemble Q learning. Our proposed approach integrates multi-head self-attention into the ensembled Q networks while bootstrapping the state-action pairs ingested by the ensemble.
arXiv Detail & Related papers (2024-05-14T00:57:02Z)
Exploiting the Potential of Seq2Seq Models as Robust Few-Shot Learners [8.43854206194162]
We show that seq2seq models can be highly effective few-shot learners for a wide spectrum of applications. We propose two methods to more effectively elicit in-context learning ability in seq2seq models: objective-aligned prompting and a fusion-based approach.
arXiv Detail & Related papers (2023-07-27T13:37:06Z)
Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution. We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well. Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z)
Boosting Low-Data Instance Segmentation by Unsupervised Pre-training with Saliency Prompt [103.58323875748427]
This work offers a novel unsupervised pre-training solution for low-data regimes. Inspired by the recent success of the Prompting technique, we introduce a new pre-training method that boosts QEIS models. Experimental results show that our method significantly boosts several QEIS models on three datasets.
arXiv Detail & Related papers (2023-02-02T15:49:03Z)
Simultaneous Double Q-learning with Conservative Advantage Learning for Actor-Critic Methods [133.85604983925282]
We propose Simultaneous Double Q-learning with Conservative Advantage Learning (SDQ-CAL) Our algorithm realizes less biased value estimation and achieves state-of-the-art performance in a range of continuous control benchmark tasks.
arXiv Detail & Related papers (2022-05-08T09:17:16Z)
Aggressive Q-Learning with Ensembles: Achieving Both High Sample Efficiency and High Asymptotic Performance [12.871109549160389]
We propose a novel model-free algorithm, Aggressive Q-Learning with Ensembles (AQE), which improves the sample-efficiency performance of REDQ and the performance of TQC. AQE is very simple, requiring neither distributional representation of critics nor target randomization.
arXiv Detail & Related papers (2021-11-17T14:48:52Z)
Dropout Q-Functions for Doubly Efficient Reinforcement Learning [12.267045729018653]
We propose a method of improving computational efficiency called Dr.Q. Dr.Q is a variant of REDQ that uses a small ensemble of dropout Q-functions. It achieved comparable sample efficiency with REDQ and much better computational efficiency than REDQ and comparable computational efficiency with that of SAC.
arXiv Detail & Related papers (2021-10-05T13:28:11Z)
When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable. In order to achieve a better accuracy, we propose two lightweight modules. DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers. QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.