Related papers: Modified Double DQN: addressing stability

Modified Double DQN: addressing stability

URL: http://arxiv.org/abs/2108.04115v2
Date: Tue, 29 Oct 2024 14:06:25 GMT
Title: Modified Double DQN: addressing stability
Authors: Shervin Halat, Mohammad Mehdi Ebadzadeh, Kiana Amani,
Abstract summary: The Double-DQN (DDQN) algorithm was originally proposed to address the overestimation issue in the original DQN algorithm. Three modifications to the DDQN algorithm are proposed with the hope of maintaining the performance in the terms of both stability and overestimation.
Score: 0.2867517731896504
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Inspired by Double Q-learning algorithm, the Double-DQN (DDQN) algorithm was originally proposed in order to address the overestimation issue in the original DQN algorithm. The DDQN has successfully shown both theoretically and empirically the importance of decoupling in terms of action evaluation and selection in computation of target values; although, all the benefits were acquired with only a simple adaption to DQN algorithm, minimal possible change as it was mentioned by the authors. Nevertheless, there seems a roll-back in the proposed algorithm of DDQN since the parameters of policy network are emerged again in the target value function which were initially withdrawn by DQN with the hope of tackling the serious issue of moving targets and the instability caused by it (i.e., by moving targets) in the process of learning. Therefore, in this paper three modifications to the DDQN algorithm are proposed with the hope of maintaining the performance in the terms of both stability and overestimation. These modifications are focused on the logic of decoupling the best action selection and evaluation in the target value function and the logic of tackling the moving targets issue. Each of these modifications have their own pros and cons compared to the others. The mentioned pros and cons mainly refer to the execution time required for the corresponding algorithm and the stability provided by the corresponding algorithm. Also, in terms of overestimation, none of the modifications seem to underperform compared to the original DDQN if not outperform it. With the intention of evaluating the efficacy of the proposed modifications, multiple empirical experiments along with theoretical experiments were conducted. The results obtained are represented and discussed in this article.

Related papers

On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning Implementations [53.0667196725616]
Deep Reinforcement Learning (DRL) is a paradigm of artificial intelligence where an agent uses a neural network to learn which actions to take in a given environment. DRL has recently gained traction from being able to solve complex environments like driving simulators, 3D robotic control, and multiplayer-online-battle-arena video games. Numerous implementations of the state-of-the-art algorithms responsible for training these agents, like the Deep Q-Network (DQN) and Proximal Policy Optimization (PPO) algorithms, currently exist.
arXiv Detail & Related papers (2025-03-28T16:25:06Z)
Smart Sampling: Self-Attention and Bootstrapping for Improved Ensembled Q-Learning [0.6963971634605796]
We present a novel method aimed at enhancing the sample efficiency of ensemble Q learning. Our proposed approach integrates multi-head self-attention into the ensembled Q networks while bootstrapping the state-action pairs ingested by the ensemble.
arXiv Detail & Related papers (2024-05-14T00:57:02Z)
Rethinking PGD Attack: Is Sign Function Necessary? [131.6894310945647]
We present a theoretical analysis of how such sign-based update algorithm influences step-wise attack performance. We propose a new raw gradient descent (RGD) algorithm that eliminates the use of sign. The effectiveness of the proposed RGD algorithm has been demonstrated extensively in experiments.
arXiv Detail & Related papers (2023-12-03T02:26:58Z)
Prominent Roles of Conditionally Invariant Components in Domain Adaptation: Theory and Algorithms [10.949415951813661]
Domain adaptation (DA) is a statistical learning problem that arises when the distribution of the source data used to train a model differs from that of the target data used to evaluate the model. We show that conditionally invariant components (CICs) are relevant for prediction and remain conditionally invariant across the source and target data. We propose a new algorithm based on CICs, importance-weighted conditional invariant penalty (IW-CIP), which has target risk guarantees beyond simple settings.
arXiv Detail & Related papers (2023-09-19T04:04:59Z)
Benchmark tasks for Quality-Diversity applied to Uncertain domains [1.5469452301122175]
We introduce a set of 8 easy-to-implement and lightweight tasks, split into 3 main categories. We identify the key uncertainty properties to easily define UQD benchmark tasks. All our tasks build on the Redundant Arm: a common QD environment that is lightweight and easily replicable.
arXiv Detail & Related papers (2023-04-24T21:23:26Z)
A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive Coding Networks [65.34977803841007]
Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience. We show how by simply changing the temporal scheduling of the update rule for the synaptic weights leads to an algorithm that is much more efficient and stable than the original one.
arXiv Detail & Related papers (2022-11-16T00:11:04Z)
Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z)
Towards QD-suite: developing a set of benchmarks for Quality-Diversity algorithms [0.0]
Existing benchmarks are not standardized, and there is currently no MNIST equivalent for Quality-Diversity (QD) We argue that the identification of challenges faced by QD methods and the development of targeted, challenging, scalable benchmarks is an important step.
arXiv Detail & Related papers (2022-05-06T13:33:50Z)
Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples. We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment. We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z)
A Penalized Shared-parameter Algorithm for Estimating Optimal Dynamic Treatment Regimens [3.9023554886892438]
We show that the existing Q-shared algorithm can suffer from non-convergence due to the use of linear models in the Q-learning setup. We give evidence for the proposed method in a real-world application and several synthetic simulations.
arXiv Detail & Related papers (2021-07-13T05:31:14Z)
AP-Loss for Accurate One-Stage Object Detection [49.13608882885456]
One-stage object detectors are trained by optimizing classification-loss and localization-loss simultaneously. The former suffers much from extreme foreground-background imbalance due to the large number of anchors. This paper proposes a novel framework to replace the classification task in one-stage detectors with a ranking task.
arXiv Detail & Related papers (2020-08-17T13:22:01Z)
Optimistic Exploration even with a Pessimistic Initialisation [57.41327865257504]
Optimistic initialisation is an effective strategy for efficient exploration in reinforcement learning (RL) In particular, in scenarios with only positive rewards, Q-values are initialised at their lowest possible values. We propose a simple count-based augmentation to pessimistically initialised Q-values that separates the source of optimism from the neural network.
arXiv Detail & Related papers (2020-02-26T17:15:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.