Do We Actually Need Dense Over-Parameterization? In-Time
Over-Parameterization in Sparse Training
- URL: http://arxiv.org/abs/2102.02887v1
- Date: Thu, 4 Feb 2021 20:59:31 GMT
- Title: Do We Actually Need Dense Over-Parameterization? In-Time
Over-Parameterization in Sparse Training
- Authors: Shiwei Liu, Lu Yin, Decebal Constantin Mocanu, Mykola Pechenizkiy
- Abstract summary: We propose the concept of In-Time Over- Resibilityization (ITOP) in sparse training.
ITOP closes the gap in the express between sparse training and dense training.
We present a series of experiments to support our conjecture and achieve the state-of-the-art sparse training performance.
- Score: 16.81321230135317
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce a new perspective on training deep neural
networks capable of state-of-the-art performance without the need for the
expensive over-parameterization by proposing the concept of In-Time
Over-Parameterization (ITOP) in sparse training. By starting from a random
sparse network and continuously exploring sparse connectivities during
training, we can perform an Over-Parameterization in the space-time manifold,
closing the gap in the expressibility between sparse training and dense
training. We further use ITOP to understand the underlying mechanism of Dynamic
Sparse Training (DST) and indicate that the benefits of DST come from its
ability to consider across time all possible parameters when searching for the
optimal sparse connectivity. As long as there are sufficient parameters that
have been reliably explored during training, DST can outperform the dense
neural network by a large margin. We present a series of experiments to support
our conjecture and achieve the state-of-the-art sparse training performance
with ResNet-50 on ImageNet. More impressively, our method achieves dominant
performance over the overparameterization-based sparse methods at extreme
sparsity levels. When trained on CIFAR-100, our method can match the
performance of the dense model even at an extreme sparsity (98%).
Related papers
- Sparse Spectral Training and Inference on Euclidean and Hyperbolic Neural Networks [9.96381061452642]
We propose Sparse Spectral Training (SST), an advanced training methodology that updates all singular values and selectively updates singular vectors of network weights.
SST refines the training process by employing a targeted updating strategy for singular vectors, which is determined by a multinomial sampling method weighted by the significance of the singular values.
On OPT-125M, with rank equating to 8.3% of embedding dimension, SST reduces the perplexity gap to full-rank training by 67.6%, demonstrating a significant reduction of the performance loss with prevalent low-rank methods.
arXiv Detail & Related papers (2024-05-24T11:59:41Z) - Disentangling Spatial and Temporal Learning for Efficient Image-to-Video
Transfer Learning [59.26623999209235]
We present DiST, which disentangles the learning of spatial and temporal aspects of videos.
The disentangled learning in DiST is highly efficient because it avoids the back-propagation of massive pre-trained parameters.
Extensive experiments on five benchmarks show that DiST delivers better performance than existing state-of-the-art methods by convincing gaps.
arXiv Detail & Related papers (2023-09-14T17:58:33Z) - Dynamic Sparse Training with Structured Sparsity [11.778353786208765]
Dynamic Sparse Training (DST) methods achieve state-of-the-art results in sparse neural network training.
We propose a sparse-to-sparse DST method, Structured RigL (SRigL), to learn a variant of fine-grained structured N:M sparsity.
We demonstrate a real-world acceleration of 3.4x/2.5x on CPU for online inference and 1.7x/13.0x on GPU for inference with a batch size of 256.
arXiv Detail & Related papers (2023-05-03T17:48:55Z) - Dynamic Sparse Training via Balancing the Exploration-Exploitation
Trade-off [19.230329532065635]
Sparse training could significantly mitigate the training costs by reducing the model size.
Existing sparse training methods mainly use either random-based or greedy-based drop-and-grow strategies.
In this work, we consider the dynamic sparse training as a sparse connectivity search problem.
Experimental results show that sparse models (up to 98% sparsity) obtained by our proposed method outperform the SOTA sparse training methods.
arXiv Detail & Related papers (2022-11-30T01:22:25Z) - Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models.
Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency.
We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z) - Online Convolutional Re-parameterization [51.97831675242173]
We present online convolutional re- parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution.
Compared with the state-of-the-art re-param models, OREPA is able to save the training-time memory cost by about 70% and accelerate the training speed by around 2x.
We also conduct experiments on object detection and semantic segmentation and show consistent improvements on the downstream tasks.
arXiv Detail & Related papers (2022-04-02T09:50:19Z) - MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the
Edge [72.16021611888165]
This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for accurate and fast execution on edge devices.
The proposed MEST framework consists of enhancements by Elastic Mutation (EM) and Soft Memory Bound (&S)
Our results suggest that unforgettable examples can be identified in-situ even during the dynamic exploration of sparsity masks.
arXiv Detail & Related papers (2021-10-26T21:15:17Z) - Selfish Sparse RNN Training [13.165729746380816]
We propose an approach to train sparse RNNs with a fixed parameter count in one single run, without compromising performance.
We achieve state-of-the-art sparse training results with various datasets on Penn TreeBank and Wikitext-2.
arXiv Detail & Related papers (2021-01-22T10:45:40Z) - Predicting Training Time Without Training [120.92623395389255]
We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function.
We leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model.
We are able to predict the time it takes to fine-tune a model to a given loss without having to perform any training.
arXiv Detail & Related papers (2020-08-28T04:29:54Z) - Progressive Skeletonization: Trimming more fat from a network at
initialization [76.11947969140608]
We propose an objective to find a skeletonized network with maximum connection sensitivity.
We then propose two approximate procedures to maximize our objective.
Our approach provides remarkably improved performance on higher pruning levels.
arXiv Detail & Related papers (2020-06-16T11:32:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.