Hyperspherical Normalization for Scalable Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2502.15280v1
- Date: Fri, 21 Feb 2025 08:17:24 GMT
- Title: Hyperspherical Normalization for Scalable Deep Reinforcement Learning
- Authors: Hojoon Lee, Youngdo Lee, Takuma Seno, Donghu Kim, Peter Stone, Jaegul Choo,
- Abstract summary: SimbaV2 is a novel reinforcement learning architecture designed to stabilize optimization.<n>It scales up effectively with larger models and greater compute, achieving state-of-the-art performance on 57 continuous control tasks.
- Score: 57.016639036237315
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scaling up the model size and computation has brought consistent performance improvements in supervised learning. However, this lesson often fails to apply to reinforcement learning (RL) because training the model on non-stationary data easily leads to overfitting and unstable optimization. In response, we introduce SimbaV2, a novel RL architecture designed to stabilize optimization by (i) constraining the growth of weight and feature norm by hyperspherical normalization; and (ii) using a distributional value estimation with reward scaling to maintain stable gradients under varying reward magnitudes. Using the soft actor-critic as a base algorithm, SimbaV2 scales up effectively with larger models and greater compute, achieving state-of-the-art performance on 57 continuous control tasks across 4 domains. The code is available at https://dojeon-ai.github.io/SimbaV2.
Related papers
- Improving Deep Knowledge Tracing via Gated Architectures and Adaptive Optimization [0.0]
Deep Knowledge Tracing (DKT) models student learning behavior by using Recurrent Networks (RNNs) to predict future performance based on historical interaction data.
In this work, we revisit the DKT model from two perspectives: architectural improvements and optimization.
First, we enhance the model using gated recurrent units, specifically Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRU)
Second, we re-implement DKT using the PyTorch framework, enabling a modular and accessible infrastructure compatible with modern deep learning.
arXiv Detail & Related papers (2025-04-24T14:24:31Z) - Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo [22.7130140114906]
We study the scaling law behavior of DiLoCo when training LLMs under a fixed compute budget.
We find that DiLoCo scales both predictably and robustly with model size.
When well-tuned, DiLoCo scales better than data-parallel training with model size, and can outperform data-parallel training even at small model sizes.
arXiv Detail & Related papers (2025-03-12T20:04:38Z) - S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning [51.84977135926156]
We introduce S$2$R, an efficient framework that enhances LLM reasoning by teaching models to self-verify and self-correct during inference.
Our results demonstrate that Qwen2.5-math-7B achieves an accuracy improvement from 51.0% to 81.6%, outperforming models trained on an equivalent amount of long-CoT distilled data.
arXiv Detail & Related papers (2025-02-18T13:40:22Z) - Large Continual Instruction Assistant [59.585544987096974]
Continual Instruction Tuning (CIT) is adopted to instruct Large Models to follow human intent data by data.
Existing update gradient would heavily destroy the performance on previous datasets during CIT process.
We propose a general continual instruction tuning framework to address the challenge.
arXiv Detail & Related papers (2024-10-08T11:24:59Z) - A Dynamical Model of Neural Scaling Laws [79.59705237659547]
We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization.
Our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.
arXiv Detail & Related papers (2024-02-02T01:41:38Z) - Deep Equilibrium Optical Flow Estimation [80.80992684796566]
Recent state-of-the-art (SOTA) optical flow models use finite-step recurrent update operations to emulate traditional algorithms.
These RNNs impose large computation and memory overheads, and are not directly trained to model such stable estimation.
We propose deep equilibrium (DEQ) flow estimators, an approach that directly solves for the flow as the infinite-level fixed point of an implicit layer.
arXiv Detail & Related papers (2022-04-18T17:53:44Z) - Scalable Rule-Based Representation Learning for Interpretable
Classification [12.736847587988853]
Rule-based Learner Representation (RRL) learns interpretable non-fuzzy rules for data representation and classification.
RRL can be easily adjusted to obtain a trade-off between classification accuracy and model complexity for different scenarios.
arXiv Detail & Related papers (2021-09-30T13:07:42Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.