Related papers: Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning

Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning

URL: http://arxiv.org/abs/2506.17204v1
Date: Fri, 20 Jun 2025 17:54:24 GMT
Title: Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning
Authors: Guozheng Ma, Lu Li, Zilin Wang, Li Shen, Pierre-Luc Bacon, Dacheng Tao,
Abstract summary: We show that introducing static network sparsity alone can unlock further scaling potential beyond dense counterparts with state-of-the-art architectures.<n>Our analysis reveals that, in contrast to naively scaling up dense DRL networks, such sparse networks achieve both higher parameter efficiency for network expressivity.
Score: 57.3885832382455
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Effectively scaling up deep reinforcement learning models has proven notoriously difficult due to network pathologies during training, motivating various targeted interventions such as periodic reset and architectural advances such as layer normalization. Instead of pursuing more complex modifications, we show that introducing static network sparsity alone can unlock further scaling potential beyond their dense counterparts with state-of-the-art architectures. This is achieved through simple one-shot random pruning, where a predetermined percentage of network weights are randomly removed once before training. Our analysis reveals that, in contrast to naively scaling up dense DRL networks, such sparse networks achieve both higher parameter efficiency for network expressivity and stronger resistance to optimization challenges like plasticity loss and gradient interference. We further extend our evaluation to visual and streaming RL scenarios, demonstrating the consistent benefits of network sparsity.

Related papers

Online Training and Pruning of Deep Reinforcement Learning Networks [0.0]
Scaling deep neural networks (NN) of reinforcement learning (RL) algorithms has been shown to enhance performance when feature extraction networks are used.<n>We propose an approach to integrate simultaneous training and pruning within advanced RL methods.
arXiv Detail & Related papers (2025-07-16T07:17:41Z)
Auto-Compressing Networks [59.83547898874152]
We introduce Auto- Networks (ACNs), an architectural variant where additive long feedforward connections from each layer replace traditional short residual connections.<n>ACNs showcase unique property we coin as "auto-compression", the ability of a network to organically compress information during training.<n>We find that ACNs exhibit enhanced noise robustness compared to residual networks, superior performance in low-data settings, and mitigate catastrophic forgetting.
arXiv Detail & Related papers (2025-06-11T13:26:09Z)
Analyzing and Improving the Training Dynamics of Diffusion Models [36.37845647984578]
We identify and rectify several causes for uneven and ineffective training in the popular ADM diffusion model architecture. We find that systematic application of this philosophy eliminates the observed drifts and imbalances, resulting in considerably better networks at equal computational complexity.
arXiv Detail & Related papers (2023-12-05T11:55:47Z)
Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures. This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead. We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z)
RDRN: Recursively Defined Residual Network for Image Super-Resolution [58.64907136562178]
Deep convolutional neural networks (CNNs) have obtained remarkable performance in single image super-resolution. We propose a novel network architecture which utilizes attention blocks efficiently.
arXiv Detail & Related papers (2022-11-17T11:06:29Z)
Dynamics-aware Adversarial Attack of Adaptive Neural Networks [75.50214601278455]
We investigate the dynamics-aware adversarial attack problem of adaptive neural networks. We propose a Leaded Gradient Method (LGM) and show the significant effects of the lagged gradient. Our LGM achieves impressive adversarial attack performance compared with the dynamic-unaware attack methods.
arXiv Detail & Related papers (2022-10-15T01:32:08Z)
SIRe-Networks: Skip Connections over Interlaced Multi-Task Learning and Residual Connections for Structure Preserving Object Classification [28.02302915971059]
In this paper, we introduce an interlaced multi-task learning strategy, defined SIRe, to reduce the vanishing gradient in relation to the object classification task. The presented methodology directly improves a convolutional neural network (CNN) by enforcing the input image structure preservation through auto-encoders. To validate the presented methodology, a simple CNN and various implementations of famous networks are extended via the SIRe strategy and extensively tested on the CIFAR100 dataset.
arXiv Detail & Related papers (2021-10-06T13:54:49Z)
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks [78.47459801017959]
Sparsity can reduce the memory footprint of regular networks to fit mobile devices. We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice.
arXiv Detail & Related papers (2021-01-31T22:48:50Z)
Improving Neural Network Robustness through Neighborhood Preserving Layers [0.751016548830037]
We demonstrate a novel neural network architecture which can incorporate such layers and also can be trained efficiently. We empirically show that our designed network architecture is more robust against state-of-art gradient descent based attacks.
arXiv Detail & Related papers (2021-01-28T01:26:35Z)
Learn2Perturb: an End-to-end Feature Perturbation Learning to Improve Adversarial Robustness [79.47619798416194]
Learn2Perturb is an end-to-end feature perturbation learning approach for improving the adversarial robustness of deep neural networks. Inspired by the Expectation-Maximization, an alternating back-propagation training algorithm is introduced to train the network and noise parameters consecutively.
arXiv Detail & Related papers (2020-03-02T18:27:35Z)
Exploiting the Full Capacity of Deep Neural Networks while Avoiding Overfitting by Targeted Sparsity Regularization [1.3764085113103217]
Overfitting is one of the most common problems when training deep neural networks on comparatively small datasets. We propose novel targeted sparsity visualization and regularization strategies to counteract overfitting.
arXiv Detail & Related papers (2020-02-21T11:38:17Z)
Differentiable Sparsification for Deep Neural Networks [0.0]
We propose a fully differentiable sparsification method for deep neural networks. The proposed method can learn both the sparsified structure and weights of a network in an end-to-end manner. To the best of our knowledge, this is the first fully differentiable sparsification method.
arXiv Detail & Related papers (2019-10-08T03:57:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.