Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for
Sparse Training
- URL: http://arxiv.org/abs/2209.11204v1
- Date: Thu, 22 Sep 2022 17:45:23 GMT
- Title: Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for
Sparse Training
- Authors: Geng Yuan, Yanyu Li, Sheng Li, Zhenglun Kong, Sergey Tulyakov, Xulong
Tang, Yanzhi Wang, Jian Ren
- Abstract summary: We show that layer freezing and data sieving can be incorporated into the sparse training algorithm to form a generic framework, which we dub SpFDE.
Our experiments demonstrate that SpFDE can significantly reduce training costs while preserving accuracy from three dimensions: weight sparsity, layer freezing, and dataset sieving.
- Score: 48.152207339344564
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, sparse training has emerged as a promising paradigm for efficient
deep learning on edge devices. The current research mainly devotes efforts to
reducing training costs by further increasing model sparsity. However,
increasing sparsity is not always ideal since it will inevitably introduce
severe accuracy degradation at an extremely high sparsity level. This paper
intends to explore other possible directions to effectively and efficiently
reduce sparse training costs while preserving accuracy. To this end, we
investigate two techniques, namely, layer freezing and data sieving. First, the
layer freezing approach has shown its success in dense model training and
fine-tuning, yet it has never been adopted in the sparse training domain.
Nevertheless, the unique characteristics of sparse training may hinder the
incorporation of layer freezing techniques. Therefore, we analyze the
feasibility and potentiality of using the layer freezing technique in sparse
training and find it has the potential to save considerable training costs.
Second, we propose a data sieving method for dataset-efficient training, which
further reduces training costs by ensuring only a partial dataset is used
throughout the entire training process. We show that both techniques can be
well incorporated into the sparse training algorithm to form a generic
framework, which we dub SpFDE. Our extensive experiments demonstrate that SpFDE
can significantly reduce training costs while preserving accuracy from three
dimensions: weight sparsity, layer freezing, and dataset sieving.
Related papers
- Fast and Controllable Post-training Sparsity: Learning Optimal Sparsity Allocation with Global Constraint in Minutes [33.68058313321142]
We propose a controllable post-training sparsity (FCPTS) framework for neural network sparsity.
Our method allows for rapid and accurate sparsity allocation learning in minutes, with the added assurance of convergence to a global sparsity rate.
arXiv Detail & Related papers (2024-05-09T14:47:15Z) - Always-Sparse Training by Growing Connections with Guided Stochastic
Exploration [46.4179239171213]
We propose an efficient always-sparse training algorithm with excellent scaling to larger and sparser models.
We evaluate our method on CIFAR-10/100 and ImageNet using VGG, and ViT models, and compare it against a range of sparsification methods.
arXiv Detail & Related papers (2024-01-12T21:32:04Z) - Balance is Essence: Accelerating Sparse Training via Adaptive Gradient
Correction [29.61757744974324]
Deep neural networks require significant memory and computation costs.
Sparse training is one of the most common techniques to reduce these costs.
In this work, we aim to overcome this problem and achieve space-time co-efficiency.
arXiv Detail & Related papers (2023-01-09T18:50:03Z) - Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep
Neural Network, a Survey [69.3939291118954]
State-of-the-art deep learning models have a parameter count that reaches into the billions. Training, storing and transferring such models is energy and time consuming, thus costly.
Model compression lowers storage and transfer costs, and can further make training more efficient by decreasing the number of computations in the forward and/or backward pass.
This work is a survey on methods which reduce the number of trained weights in deep learning models throughout the training.
arXiv Detail & Related papers (2022-05-17T05:37:08Z) - MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the
Edge [72.16021611888165]
This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for accurate and fast execution on edge devices.
The proposed MEST framework consists of enhancements by Elastic Mutation (EM) and Soft Memory Bound (&S)
Our results suggest that unforgettable examples can be identified in-situ even during the dynamic exploration of sparsity masks.
arXiv Detail & Related papers (2021-10-26T21:15:17Z) - A Simple Fine-tuning Is All You Need: Towards Robust Deep Learning Via
Adversarial Fine-tuning [90.44219200633286]
We propose a simple yet very effective adversarial fine-tuning approach based on a $textitslow start, fast decay$ learning rate scheduling strategy.
Experimental results show that the proposed adversarial fine-tuning approach outperforms the state-of-the-art methods on CIFAR-10, CIFAR-100 and ImageNet datasets.
arXiv Detail & Related papers (2020-12-25T20:50:15Z) - FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z) - Predicting Training Time Without Training [120.92623395389255]
We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function.
We leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model.
We are able to predict the time it takes to fine-tune a model to a given loss without having to perform any training.
arXiv Detail & Related papers (2020-08-28T04:29:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.