Related papers: GB-DQN: Gradient Boosted DQN Models for Non-stationary Reinforcement Learning

GB-DQN: Gradient Boosted DQN Models for Non-stationary Reinforcement Learning

URL: http://arxiv.org/abs/2512.17034v1
Date: Thu, 18 Dec 2025 19:53:50 GMT
Title: GB-DQN: Gradient Boosted DQN Models for Non-stationary Reinforcement Learning
Authors: Chang-Hwan Lee, Chanseung Lee,
Abstract summary: We propose emphGradient-Boosted Deep Q-Networks (GB-DQN), an adaptive ensemble method that addresses model drift through incremental residual learning.<n>Instead of retraining a single Q-network, GB-DQN constructs an additive ensemble in which each new learner is trained to approximate the Bellman residual of the current ensemble after drift.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Non-stationary environments pose a fundamental challenge for deep reinforcement learning, as changes in dynamics or rewards invalidate learned value functions and cause catastrophic forgetting. We propose \emph{Gradient-Boosted Deep Q-Networks (GB-DQN)}, an adaptive ensemble method that addresses model drift through incremental residual learning. Instead of retraining a single Q-network, GB-DQN constructs an additive ensemble in which each new learner is trained to approximate the Bellman residual of the current ensemble after drift. We provide theoretical results showing that each boosting step reduces the empirical Bellman residual and that the ensemble converges to the post-drift optimal value function under standard assumptions. Experiments across a diverse set of control tasks with controlled dynamics changes demonstrate faster recovery, improved stability, and greater robustness compared to DQN and common non-stationary baselines.

Related papers

Confounding Robust Continuous Control via Automatic Reward Shaping [48.93769483870838]
We propose to automatically learn a reward shaping function for continuous control problems from offline datasets.<n>Our method builds upon the recently proposed causal Bellman equation to learn a tight upper bound on the optimal state values.<n>Our work marks a solid first step towards confounding robust continuous control from a causal perspective.
arXiv Detail & Related papers (2026-02-10T21:23:12Z)
Consistency Deep Equilibrium Models [8.278751626877431]
Deep Equilibrium Models (DEQs) have emerged as a powerful paradigm in deep learning.<n>DEQs incur significant inference latency due to the iterative nature of fixed-point solvers.<n>We introduce the Consistency Deep Equilibrium Model (C-DEQ) to accelerate DEQ inference.
arXiv Detail & Related papers (2026-02-03T02:42:48Z)
Sat-EnQ: Satisficing Ensembles of Weak Q-Learners for Reliable and Compute-Efficient Reinforcement Learning [0.0]
We introduce Sat-EnQ, a framework that learns to be good enough'' before optimizing aggressively.<n>In Phase 1, we train an ensemble of lightweight Q-networks under a satisficing objective that limits early value growth.<n>In Phase 2, the ensemble is distilled into a larger network and fine-tuned with standard Double DQN.
arXiv Detail & Related papers (2025-12-28T12:41:09Z)
Provable Benefit of Curriculum in Transformer Tree-Reasoning Post-Training [76.12556589212666]
We show that curriculum post-training avoids the exponential complexity bottleneck.<n>Under outcome-only reward signals, reinforcement learning finetuning achieves high accuracy with sample complexity.<n>We establish guarantees for test-time scaling, where curriculum-aware querying reduces both reward oracle calls and sampling cost from exponential to order.
arXiv Detail & Related papers (2025-11-10T18:29:54Z)
Adaptive Variance-Penalized Continual Learning with Fisher Regularization [0.0]
This work presents a novel continual learning framework that integrates Fisher-weighted asymmetric regularization of parameter variances.<n>Our method dynamically modulates regularization intensity according to parameter uncertainty, achieving enhanced stability and performance.
arXiv Detail & Related papers (2025-08-15T21:49:28Z)
Ensemble Elastic DQN: A novel multi-step ensemble approach to address overestimation in deep value-based reinforcement learning [1.8008841825105586]
We introduce a novel algorithm called Ensemble Elastic Step DQN (EEDQN), which unifies ensembles with elastic step updates to stabilise algorithmic performance.<n>EEDQN is designed to address two major challenges in deep reinforcement learning: overestimation bias and sample efficiency.<n>Our results show that EEDQN achieves consistently robust performance across all tested environments.
arXiv Detail & Related papers (2025-06-06T03:36:19Z)
Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization [15.212942734663514]
CrossQ has demonstrated state-of-the-art sample efficiency with a low update-to-data (UTD) ratio of 1.<n>We identify challenges in the training dynamics, which are emphasized by higher UTD ratios.<n>Our proposed approach reliably scales with increasing UTD ratios, achieving competitive performance across 25 challenging continuous control tasks.
arXiv Detail & Related papers (2025-02-11T12:55:32Z)
SALE-Based Offline Reinforcement Learning with Ensemble Q-Networks [0.0]
We propose a model-free actor-critic algorithm that integrates ensemble Q-networks and a gradient diversity penalty from EDAC.<n>Our algorithm achieves higher convergence speed, stability, and performance compared to existing methods.
arXiv Detail & Related papers (2025-01-07T10:22:30Z)
Large Continual Instruction Assistant [59.585544987096974]
Continual Instruction Tuning (CIT) is adopted to instruct Large Models to follow human intent data by data.<n>Existing update gradient would heavily destroy the performance on previous datasets during CIT process.<n>We propose a general continual instruction tuning framework to address the challenge.
arXiv Detail & Related papers (2024-10-08T11:24:59Z)
Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion. Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z)
Single-Trajectory Distributionally Robust Reinforcement Learning [21.955807398493334]
We propose Distributionally Robust RL (DRRL) to enhance performance across a range of environments. Existing DRRL algorithms are either model-based or fail to learn from a single sample trajectory. We design a first fully model-free DRRL algorithm, called distributionally robust Q-learning with single trajectory (DRQ)
arXiv Detail & Related papers (2023-01-27T14:08:09Z)
Training Generative Adversarial Networks by Solving Ordinary Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training. From this perspective, we hypothesise that instabilities in training GANs arise from the integration error. We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z)
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.