Scaling CrossQ with Weight Normalization
- URL: http://arxiv.org/abs/2506.03758v1
- Date: Wed, 04 Jun 2025 09:24:17 GMT
- Title: Scaling CrossQ with Weight Normalization
- Authors: Daniel Palenicek, Florian Vogt, Jan Peters,
- Abstract summary: CrossQ has demonstrated state-of-the-art sample efficiency with a low update-to-data (UTD) ratio of 1.<n>We identify challenges in the training dynamics which are emphasized by higher UTDs.<n>We propose a solution that stabilizes training, prevents potential loss of plasticity and keeps the effective learning rate constant.
- Score: 15.605124749589946
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning has achieved significant milestones, but sample efficiency remains a bottleneck for real-world applications. Recently, CrossQ has demonstrated state-of-the-art sample efficiency with a low update-to-data (UTD) ratio of 1. In this work, we explore CrossQ's scaling behavior with higher UTD ratios. We identify challenges in the training dynamics which are emphasized by higher UTDs, particularly Q-bias explosion and the growing magnitude of critic network weights. To address this, we integrate weight normalization into the CrossQ framework, a solution that stabilizes training, prevents potential loss of plasticity and keeps the effective learning rate constant. Our proposed approach reliably scales with increasing UTD ratios, achieving competitive or superior performance across a range of challenging tasks on the DeepMind control benchmark, notably the complex dog and humanoid environments. This work eliminates the need for drastic interventions, such as network resets, and offers a robust pathway for improving sample efficiency and scalability in model-free reinforcement learning.
Related papers
- Q-STAC: Q-Guided Stein Variational Model Predictive Actor-Critic [12.837649598521102]
This paper introduces the Q-guided STein variational model predictive Actor-Critic (Q-STAC) framework for continuous control tasks.<n>Our method optimize control sequences directly using learned Q-values as objectives, eliminating the need for explicit cost function design.<n>Experiments on 2D navigation and robotic manipulation tasks demonstrate that Q-STAC achieves superior sample efficiency, robustness, and optimality compared to state-of-the-art algorithms.
arXiv Detail & Related papers (2025-07-09T07:53:53Z) - Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning [57.3885832382455]
We show that introducing static network sparsity alone can unlock further scaling potential beyond dense counterparts with state-of-the-art architectures.<n>Our analysis reveals that, in contrast to naively scaling up dense DRL networks, such sparse networks achieve both higher parameter efficiency for network expressivity.
arXiv Detail & Related papers (2025-06-20T17:54:24Z) - TAET: Two-Stage Adversarial Equalization Training on Long-Tailed Distributions [3.9635480458924994]
Adversarial robustness is a critical challenge in deploying deep neural networks for real-world applications.<n>We propose a novel training framework, TAET, which integrates an initial stabilization phase followed by a stratified adversarial training phase.<n>Our method surpasses existing advanced defenses, achieving significant improvements in both memory and computational efficiency.
arXiv Detail & Related papers (2025-03-02T12:07:00Z) - Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization [15.212942734663514]
CrossQ has demonstrated state-of-the-art sample efficiency with a low update-to-data (UTD) ratio of 1.<n>We identify challenges in the training dynamics, which are emphasized by higher UTD ratios.<n>Our proposed approach reliably scales with increasing UTD ratios, achieving competitive performance across 25 challenging continuous control tasks.
arXiv Detail & Related papers (2025-02-11T12:55:32Z) - SPEQ: Offline Stabilization Phases for Efficient Q-Learning in High Update-To-Data Ratio Reinforcement Learning [51.10866035483686]
High update-to-data (UTD) ratio algorithms in reinforcement learning (RL) improve sample efficiency but incur high computational costs, limiting real-world scalability.<n>We propose Offline Stabilization Phases for Efficient Q-Learning (SPEQ), an RL algorithm that combines low-UTD online training with periodic offline stabilization phases.<n>During these phases, Q-functions are fine-tuned with high UTD ratios on a fixed replay buffer, reducing redundant updates on suboptimal data.
arXiv Detail & Related papers (2025-01-15T09:04:19Z) - PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation [61.57833648734164]
We propose a novel Parallel Yielding Re-Activation (PYRA) method for training-inference efficient task adaptation.
PYRA outperforms all competing methods under both low compression rate and high compression rate.
arXiv Detail & Related papers (2024-03-14T09:06:49Z) - Q-TART: Quickly Training for Adversarial Robustness and
in-Transferability [28.87208020322193]
We propose to tackle Performance, Efficiency, and Robustness, using our proposed algorithm Q-TART.
Q-TART follows the intuition that samples highly susceptible to noise strongly affect the decision boundaries learned by deep neural networks.
We demonstrate improved performance and adversarial robustness while using only a subset of the training data.
arXiv Detail & Related papers (2022-04-14T15:23:08Z) - Towards Balanced Learning for Instance Recognition [149.76724446376977]
We propose Libra R-CNN, a framework towards balanced learning for instance recognition.
It integrates IoU-balanced sampling, balanced feature pyramid, and objective re-weighting, respectively for reducing the imbalance at sample, feature, and objective level.
arXiv Detail & Related papers (2021-08-23T13:40:45Z) - Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods.
Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z) - Dynamic R-CNN: Towards High Quality Object Detection via Dynamic
Training [70.2914594796002]
We propose Dynamic R-CNN to adjust the label assignment criteria and the shape of regression loss function.
Our method improves upon ResNet-50-FPN baseline with 1.9% AP and 5.5% AP$_90$ on the MS dataset with no extra overhead.
arXiv Detail & Related papers (2020-04-13T15:20:25Z) - CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity [34.36803740112609]
CrossQ matches or surpasses current state-of-the-art methods in terms of sample efficiency.
It substantially reduces the computational cost compared to REDQ and DroQ.
It is easy to implement, requiring just a few lines of code on top of SAC.
arXiv Detail & Related papers (2019-02-14T21:05:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.