QuickNets: Saving Training and Preventing Overconfidence in Early-Exit
Neural Architectures
- URL: http://arxiv.org/abs/2212.12866v1
- Date: Sun, 25 Dec 2022 07:06:32 GMT
- Title: QuickNets: Saving Training and Preventing Overconfidence in Early-Exit
Neural Architectures
- Authors: Devdhar Patel and Hava Siegelmann
- Abstract summary: We introduce QuickNets: a novel cascaded training algorithm for faster training of neural networks.
We demonstrate that QuickNets can dynamically distribute learning and have a reduced training cost and inference cost compared to standard Backpropagation.
- Score: 2.28438857884398
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep neural networks have long training and processing times. Early exits
added to neural networks allow the network to make early predictions using
intermediate activations in the network in time-sensitive applications.
However, early exits increase the training time of the neural networks. We
introduce QuickNets: a novel cascaded training algorithm for faster training of
neural networks. QuickNets are trained in a layer-wise manner such that each
successive layer is only trained on samples that could not be correctly
classified by the previous layers. We demonstrate that QuickNets can
dynamically distribute learning and have a reduced training cost and inference
cost compared to standard Backpropagation. Additionally, we introduce
commitment layers that significantly improve the early exits by identifying for
over-confident predictions and demonstrate its success.
Related papers
- Algebraic Representations for Faster Predictions in Convolutional Neural Networks [0.0]
Convolutional neural networks (CNNs) are a popular choice of model for tasks in computer vision.
skip connections may be added to create an easier gradient optimization problem.
We show that arbitrarily complex, trained, linear CNNs with skip connections can be simplified into a single-layer model.
arXiv Detail & Related papers (2024-08-14T21:10:05Z) - LNPT: Label-free Network Pruning and Training [18.535687216213624]
Pruning before training enables the deployment of neural networks on smart devices.
We propose a novel learning framework, LNPT, which enables mature networks on the cloud to provide online guidance for network pruning and learning on smart devices with unlabeled data.
arXiv Detail & Related papers (2024-03-19T12:49:09Z) - Set-Based Training for Neural Network Verification [8.97708612393722]
Small input perturbations can significantly affect the outputs of a neural network.
In safety-critical environments, the inputs often contain noisy sensor data.
We employ an end-to-end set-based training procedure that trains robust neural networks for formal verification.
arXiv Detail & Related papers (2024-01-26T15:52:41Z) - Sensitivity-Based Layer Insertion for Residual and Feedforward Neural
Networks [0.3831327965422187]
Training of neural networks requires tedious and often manual tuning of the network architecture.
We propose a systematic method to insert new layers during the training process, which eliminates the need to choose a fixed network size before training.
arXiv Detail & Related papers (2023-11-27T16:44:13Z) - SparseProp: Efficient Sparse Backpropagation for Faster Training of
Neural Networks [20.18957052535565]
We provide a new efficient version of the backpropagation algorithm, specialized to the case where the weights of the neural network being trained are sparse.
Our algorithm is general, as it applies to arbitrary (unstructured) sparsity and common layer types.
We show that it can yield speedups in end-to-end runtime experiments, both in transfer learning using already-sparsified networks, and in training sparse networks from scratch.
arXiv Detail & Related papers (2023-02-09T18:54:05Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Training Spiking Neural Networks with Local Tandem Learning [96.32026780517097]
Spiking neural networks (SNNs) are shown to be more biologically plausible and energy efficient than their predecessors.
In this paper, we put forward a generalized learning rule, termed Local Tandem Learning (LTL)
We demonstrate rapid network convergence within five training epochs on the CIFAR-10 dataset while having low computational complexity.
arXiv Detail & Related papers (2022-10-10T10:05:00Z) - Neural Capacitance: A New Perspective of Neural Network Selection via
Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction.
We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training.
Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training
with Dynamic Sparsity [74.58777701536668]
We introduce the FreeTickets concept, which can boost the performance of sparse convolutional neural networks over their dense network equivalents by a large margin.
We propose two novel efficient ensemble methods with dynamic sparsity, which yield in one shot many diverse and accurate tickets "for free" during the sparse training process.
arXiv Detail & Related papers (2021-06-28T10:48:20Z) - Taylorized Training: Towards Better Approximation of Neural Network
Training at Finite Width [116.69845849754186]
Taylorized training involves training the $k$-th order Taylor expansion of the neural network.
We show that Taylorized training agrees with full neural network training increasingly better as we increase $k$.
We complement our experiments with theoretical results showing that the approximation error of $k$-th order Taylorized models decay exponentially over $k$ in wide neural networks.
arXiv Detail & Related papers (2020-02-10T18:37:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.