Energy-efficient and Robust Cumulative Training with Net2Net
Transformation
- URL: http://arxiv.org/abs/2003.01204v1
- Date: Mon, 2 Mar 2020 21:44:47 GMT
- Title: Energy-efficient and Robust Cumulative Training with Net2Net
Transformation
- Authors: Aosong Feng, and Priyadarshini Panda
- Abstract summary: We propose a cumulative training strategy that achieves training computational efficiency without incurring large accuracy loss.
We achieve this by first training a small network on a small subset of the original dataset, and then gradually expanding the network.
Experiments demonstrate that compared with training from scratch, cumulative training yields 2x reduction in computational complexity.
- Score: 2.4283778735260686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning has achieved state-of-the-art accuracies on several computer
vision tasks. However, the computational and energy requirements associated
with training such deep neural networks can be quite high. In this paper, we
propose a cumulative training strategy with Net2Net transformation that
achieves training computational efficiency without incurring large accuracy
loss, in comparison to a model trained from scratch. We achieve this by first
training a small network (with lesser parameters) on a small subset of the
original dataset, and then gradually expanding the network using Net2Net
transformation to train incrementally on larger subsets of the dataset. This
incremental training strategy with Net2Net utilizes function-preserving
transformations that transfers knowledge from each previous small network to
the next larger network, thereby, reducing the overall training complexity. Our
experiments demonstrate that compared with training from scratch, cumulative
training yields ~2x reduction in computational complexity for training
TinyImageNet using VGG19 at iso-accuracy. Besides training efficiency, a key
advantage of our cumulative training strategy is that we can perform pruning
during Net2Net expansion to obtain a final network with optimal configuration
(~0.4x lower inference compute complexity) compared to conventional training
from scratch. We also demonstrate that the final network obtained from
cumulative training yields better generalization performance and noise
robustness. Further, we show that mutual inference from all the networks
created with cumulative Net2Net expansion enables improved adversarial input
detection.
Related papers
- PSE-Net: Channel Pruning for Convolutional Neural Networks with Parallel-subnets Estimator [16.698190973547362]
We introduce PSE-Net, a novel parallel-subnets estimator for efficient channel pruning.
Our proposed algorithm facilitates the efficiency of supernet training.
We develop a prior-distributed-based sampling algorithm to boost the performance of classical evolutionary search.
arXiv Detail & Related papers (2024-08-29T03:20:43Z) - Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs.
We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z) - E2Net: Resource-Efficient Continual Learning with Elastic Expansion
Network [24.732566251012422]
We propose a resource-efficient continual learning method called the Elastic Expansion Network (E2Net)
E2Net achieves superior average accuracy and diminished forgetting within the same computational and storage constraints.
Our method outperforms competitors in terms of both storage and computational requirements.
arXiv Detail & Related papers (2023-09-28T02:48:13Z) - Fixing the NTK: From Neural Network Linearizations to Exact Convex
Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data.
A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z) - A Generalization of Continuous Relaxation in Structured Pruning [0.3277163122167434]
Trends indicate that deeper and larger neural networks with an increasing number of parameters achieve higher accuracy than smaller neural networks.
We generalize structured pruning with algorithms for network augmentation, pruning, sub-network collapse and removal.
The resulting CNN executes efficiently on GPU hardware without computationally expensive sparse matrix operations.
arXiv Detail & Related papers (2023-08-28T14:19:13Z) - Dynamic Sparse Training for Deep Reinforcement Learning [36.66889208433228]
We propose for the first time to dynamically train deep reinforcement learning agents with sparse neural networks from scratch.
Our approach is easy to be integrated into existing deep reinforcement learning algorithms.
We evaluate our approach on OpenAI gym continuous control tasks.
arXiv Detail & Related papers (2021-06-08T09:57:20Z) - BCNet: Searching for Network Width with Bilaterally Coupled Network [56.14248440683152]
We introduce a new supernet called Bilaterally Coupled Network (BCNet) to address this issue.
In BCNet, each channel is fairly trained and responsible for the same amount of network widths, thus each network width can be evaluated more accurately.
Our method achieves state-of-the-art or competing performance over other baseline methods.
arXiv Detail & Related papers (2021-05-21T18:54:03Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive
Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning.
We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset.
We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z) - ReActNet: Towards Precise Binary Neural Network with Generalized
Activation Functions [76.05981545084738]
We propose several ideas for enhancing a binary network to close its accuracy gap from real-valued networks without incurring any additional computational cost.
We first construct a baseline network by modifying and binarizing a compact real-valued network with parameter-free shortcuts.
We show that the proposed ReActNet outperforms all the state-of-the-arts by a large margin.
arXiv Detail & Related papers (2020-03-07T02:12:02Z) - Exploring the Connection Between Binary and Spiking Neural Networks [1.329054857829016]
We bridge the recent algorithmic progress in training Binary Neural Networks and Spiking Neural Networks.
We show that training Spiking Neural Networks in the extreme quantization regime results in near full precision accuracies on large-scale datasets.
arXiv Detail & Related papers (2020-02-24T03:46:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.