Dynamic Sparse Training for Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2106.04217v1
- Date: Tue, 8 Jun 2021 09:57:20 GMT
- Title: Dynamic Sparse Training for Deep Reinforcement Learning
- Authors: Ghada Sokar, Elena Mocanu, Decebal Constantin Mocanu, Mykola
Pechenizkiy, Peter Stone
- Abstract summary: We propose for the first time to dynamically train deep reinforcement learning agents with sparse neural networks from scratch.
Our approach is easy to be integrated into existing deep reinforcement learning algorithms.
We evaluate our approach on OpenAI gym continuous control tasks.
- Score: 36.66889208433228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning has achieved significant success in many
decision-making tasks in various fields. However, it requires a large training
time of dense neural networks to obtain a good performance. This hinders its
applicability on low-resource devices where memory and computation are strictly
constrained. In a step towards enabling deep reinforcement learning agents to
be applied to low-resource devices, in this work, we propose for the first time
to dynamically train deep reinforcement learning agents with sparse neural
networks from scratch. We adopt the evolution principles of dynamic sparse
training in the reinforcement learning paradigm and introduce a training
algorithm that optimizes the sparse topology and the weight values jointly to
dynamically fit the incoming data. Our approach is easy to be integrated into
existing deep reinforcement learning algorithms and has many favorable
advantages. First, it allows for significant compression of the network size
which reduces the memory and computation costs substantially. This would
accelerate not only the agent inference but also its training process. Second,
it speeds up the agent learning process and allows for reducing the number of
required training steps. Third, it can achieve higher performance than training
the dense counterpart network. We evaluate our approach on OpenAI gym
continuous control tasks. The experimental results show the effectiveness of
our approach in achieving higher performance than one of the state-of-art
baselines with a 50\% reduction in the network size and floating-point
operations (FLOPs). Moreover, our proposed approach can reach the same
performance achieved by the dense network with a 40-50\% reduction in the
number of training steps.
Related papers
- Efficient Stagewise Pretraining via Progressive Subnetworks [53.00045381931778]
The prevailing view suggests that stagewise dropping strategies, such as layer dropping, are ineffective when compared to stacking-based approaches.
This paper challenges this notion by demonstrating that, with proper design, dropping strategies can be competitive, if not better, than stacking methods.
We propose an instantiation of this framework - Random Part Training (RAPTR) - that selects and trains only a random subnetwork at each step, progressively increasing the size in stages.
arXiv Detail & Related papers (2024-02-08T18:49:09Z) - Deep Fusion: Efficient Network Training via Pre-trained Initializations [3.9146761527401424]
We present Deep Fusion, an efficient approach to network training that leverages pre-trained initializations of smaller networks.
Our experiments show how Deep Fusion is a practical and effective approach that not only accelerates the training process but also reduces computational requirements.
We validate our theoretical framework, which guides the optimal use of Deep Fusion, showing that it significantly reduces both training time and resource consumption.
arXiv Detail & Related papers (2023-06-20T21:30:54Z) - Low Rank Optimization for Efficient Deep Learning: Making A Balance
between Compact Architecture and Fast Training [36.85333789033387]
In this paper, we focus on low-rank optimization for efficient deep learning techniques.
In the space domain, deep neural networks are compressed by low rank approximation of the network parameters.
In the time domain, the network parameters can be trained in a few subspaces, which enables efficient training for fast convergence.
arXiv Detail & Related papers (2023-03-22T03:55:16Z) - Learning in Feedback-driven Recurrent Spiking Neural Networks using
full-FORCE Training [4.124948554183487]
We propose a supervised training procedure for RSNNs, where a second network is introduced only during the training.
The proposed training procedure consists of generating targets for both recurrent and readout layers.
We demonstrate the improved performance and noise robustness of the proposed full-FORCE training procedure to model 8 dynamical systems.
arXiv Detail & Related papers (2022-05-26T19:01:19Z) - Low-rank lottery tickets: finding efficient low-rank neural networks via
matrix differential equations [2.3488056916440856]
We propose a novel algorithm to find efficient low-rankworks.
Theseworks are determined and adapted already during the training phase.
Our method automatically and dynamically adapts the ranks during training to achieve a desired approximation accuracy.
arXiv Detail & Related papers (2022-05-26T18:18:12Z) - Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep
Neural Network, a Survey [69.3939291118954]
State-of-the-art deep learning models have a parameter count that reaches into the billions. Training, storing and transferring such models is energy and time consuming, thus costly.
Model compression lowers storage and transfer costs, and can further make training more efficient by decreasing the number of computations in the forward and/or backward pass.
This work is a survey on methods which reduce the number of trained weights in deep learning models throughout the training.
arXiv Detail & Related papers (2022-05-17T05:37:08Z) - Training Larger Networks for Deep Reinforcement Learning [18.193180866998333]
We show that naively increasing network capacity does not improve performance.
We propose a novel method that consists of 1) wider networks with DenseNet connection, 2) decoupling representation learning from training of RL, and 3) a distributed training method to mitigate overfitting problems.
Using this three-fold technique, we show that we can train very large networks that result in significant performance gains.
arXiv Detail & Related papers (2021-02-16T02:16:54Z) - Sparsity in Deep Learning: Pruning and growth for efficient inference
and training in neural networks [78.47459801017959]
Sparsity can reduce the memory footprint of regular networks to fit mobile devices.
We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice.
arXiv Detail & Related papers (2021-01-31T22:48:50Z) - Subset Sampling For Progressive Neural Network Learning [106.12874293597754]
Progressive Neural Network Learning is a class of algorithms that incrementally construct the network's topology and optimize its parameters based on the training data.
We propose to speed up this process by exploiting subsets of training data at each incremental training step.
Experimental results in object, scene and face recognition problems demonstrate that the proposed approach speeds up the optimization procedure considerably.
arXiv Detail & Related papers (2020-02-17T18:57:33Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z) - Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications.
In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training.
Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.