Related papers: Ensemble Elastic DQN: A novel multi-step ensemble approach to address overestimation in deep value-based reinforcement learning

Ensemble Elastic DQN: A novel multi-step ensemble approach to address overestimation in deep value-based reinforcement learning

URL: http://arxiv.org/abs/2506.05716v1
Date: Fri, 06 Jun 2025 03:36:19 GMT
Title: Ensemble Elastic DQN: A novel multi-step ensemble approach to address overestimation in deep value-based reinforcement learning
Authors: Adrian Ly, Richard Dazeley, Peter Vamplew, Francisco Cruz, Sunil Aryal,
Abstract summary: We introduce a novel algorithm called Ensemble Elastic Step DQN (EEDQN), which unifies ensembles with elastic step updates to stabilise algorithmic performance.<n>EEDQN is designed to address two major challenges in deep reinforcement learning: overestimation bias and sample efficiency.<n>Our results show that EEDQN achieves consistently robust performance across all tested environments.
Score: 1.8008841825105586
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While many algorithmic extensions to Deep Q-Networks (DQN) have been proposed, there remains limited understanding of how different improvements interact. In particular, multi-step and ensemble style extensions have shown promise in reducing overestimation bias, thereby improving sample efficiency and algorithmic stability. In this paper, we introduce a novel algorithm called Ensemble Elastic Step DQN (EEDQN), which unifies ensembles with elastic step updates to stabilise algorithmic performance. EEDQN is designed to address two major challenges in deep reinforcement learning: overestimation bias and sample efficiency. We evaluated EEDQN against standard and ensemble DQN variants across the MinAtar benchmark, a set of environments that emphasise behavioral learning while reducing representational complexity. Our results show that EEDQN achieves consistently robust performance across all tested environments, outperforming baseline DQN methods and matching or exceeding state-of-the-art ensemble DQNs in final returns on most of the MinAtar environments. These findings highlight the potential of systematically combining algorithmic improvements and provide evidence that ensemble and multi-step methods, when carefully integrated, can yield substantial gains.

Related papers

Uncertainty Quantification for LLMs through Minimum Bayes Risk: Bridging Confidence and Consistency [66.96286531087549]
Uncertainty quantification (UQ) methods for Large Language Models (LLMs) encompass a variety of approaches.<n>We propose a novel approach to integrating model confidence with output consistency, resulting in a family of efficient and robust UQ methods.<n>We evaluate our approach across various tasks such as question answering, abstractive summarization, and machine translation.
arXiv Detail & Related papers (2025-02-07T14:30:12Z)
Finite-Time Convergence and Sample Complexity of Actor-Critic Multi-Objective Reinforcement Learning [20.491176017183044]
This paper tackles the multi-objective reinforcement learning (MORL) problem. It introduces an innovative actor-critic algorithm named MOAC which finds a policy by iteratively making trade-offs among conflicting reward signals.
arXiv Detail & Related papers (2024-05-05T23:52:57Z)
Pointer Networks with Q-Learning for Combinatorial Optimization [55.2480439325792]
We introduce the Pointer Q-Network (PQN), a hybrid neural architecture that integrates model-free Q-value policy approximation with Pointer Networks (Ptr-Nets) Our empirical results demonstrate the efficacy of this approach, also testing the model in unstable environments.
arXiv Detail & Related papers (2023-11-05T12:03:58Z)
Weakly Coupled Deep Q-Networks [5.76924666595801]
We propose a novel deep reinforcement learning algorithm that enhances performance in weakly coupled Markov decision processes (WCMDP) WCDQN employs a single network to train multiple DQN "subagents", one for each subproblem, and then combine their solutions to establish an upper bound on the optimal action value.
arXiv Detail & Related papers (2023-10-28T20:07:57Z)
Reducing Computational Complexity of Neural Networks in Optical Channel Equalization: From Concepts to Implementation [1.6987798749419218]
We show that it is possible to design an NN-based equalizer that is simpler to implement and has better performance than the conventional digital back-propagation (DBP) equalizer with only one step per span. An equalizer based on NN can also achieve superior performance while still maintaining the same degree of complexity as the full electronic chromatic dispersion compensation block.
arXiv Detail & Related papers (2022-08-26T21:00:05Z)
Aggressive Q-Learning with Ensembles: Achieving Both High Sample Efficiency and High Asymptotic Performance [12.871109549160389]
We propose a novel model-free algorithm, Aggressive Q-Learning with Ensembles (AQE), which improves the sample-efficiency performance of REDQ and the performance of TQC. AQE is very simple, requiring neither distributional representation of critics nor target randomization.
arXiv Detail & Related papers (2021-11-17T14:48:52Z)
Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks. We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution. We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z)
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
QTRAN++: Improved Value Transformation for Cooperative Multi-Agent Reinforcement Learning [70.382101956278]
QTRAN is a reinforcement learning algorithm capable of learning the largest class of joint-action value functions. Despite its strong theoretical guarantee, it has shown poor empirical performance in complex environments. We propose a substantially improved version, coined QTRAN++.
arXiv Detail & Related papers (2020-06-22T05:08:36Z)
Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.