NetCut: Real-Time DNN Inference Using Layer Removal
- URL: http://arxiv.org/abs/2101.05363v1
- Date: Wed, 13 Jan 2021 22:02:43 GMT
- Title: NetCut: Real-Time DNN Inference Using Layer Removal
- Authors: Mehrshad Zandigohar, Deniz Erdogmus, Gunar Schirner
- Abstract summary: TRimmed Networks (TRNs) are based on removing problem-specific features of a pretrained network used in transfer learning.
NetCut, a methodology based on an empirical or an analytical latency estimator, only proposes and retrains TRNs that can meet the application's deadline.
- Score: 8.762815575594395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Learning plays a significant role in assisting humans in many aspects of
their lives. As these networks tend to get deeper over time, they extract more
features to increase accuracy at the cost of additional inference latency. This
accuracy-performance trade-off makes it more challenging for Embedded Systems,
as resource-constrained processors with strict deadlines, to deploy them
efficiently. This can lead to selection of networks that can prematurely meet a
specified deadline with excess slack time that could have potentially
contributed to increased accuracy.
In this work, we propose: (i) the concept of layer removal as a means of
constructing TRimmed Networks (TRNs) that are based on removing
problem-specific features of a pretrained network used in transfer learning,
and (ii) NetCut, a methodology based on an empirical or an analytical latency
estimator, which only proposes and retrains TRNs that can meet the
application's deadline, hence reducing the exploration time significantly.
We demonstrate that TRNs can expand the Pareto frontier that trades off
latency and accuracy to provide networks that can meet arbitrary deadlines with
potential accuracy improvement over off-the-shelf networks. Our experimental
results show that such utilization of TRNs, while transferring to a simpler
dataset, in combination with NetCut, can lead to the proposal of networks that
can achieve relative accuracy improvement of up to 10.43% among existing
off-the-shelf neural architectures while meeting a specific deadline, and 27x
speedup in exploration time.
Related papers
- Direct Training Needs Regularisation: Anytime Optimal Inference Spiking Neural Network [23.434563009813218]
Spiking Neural Network (SNN) is acknowledged as the next generation of Artificial Neural Network (ANN)
We introduce a novel regularisation technique, namely Spatial-Temporal Regulariser (STR)
STR regulates the ratio between the strength of spikes and membrane potential at each timestep.
This effectively balances spatial and temporal performance during training, ultimately resulting in an Anytime Optimal Inference (AOI) SNN.
arXiv Detail & Related papers (2024-04-15T15:57:01Z) - Training Spiking Neural Networks with Local Tandem Learning [96.32026780517097]
Spiking neural networks (SNNs) are shown to be more biologically plausible and energy efficient than their predecessors.
In this paper, we put forward a generalized learning rule, termed Local Tandem Learning (LTL)
We demonstrate rapid network convergence within five training epochs on the CIFAR-10 dataset while having low computational complexity.
arXiv Detail & Related papers (2022-10-10T10:05:00Z) - Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models.
Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency.
We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z) - Navigating Local Minima in Quantized Spiking Neural Networks [3.1351527202068445]
Spiking and Quantized Neural Networks (NNs) are becoming exceedingly important for hyper-efficient implementations of Deep Learning (DL) algorithms.
These networks face challenges when trained using error backpropagation, due to the absence of gradient signals when applying hard thresholds.
This paper presents a systematic evaluation of a cosine-annealed LR schedule coupled with weight-independent adaptive moment estimation.
arXiv Detail & Related papers (2022-02-15T06:42:25Z) - Semi-supervised Network Embedding with Differentiable Deep Quantisation [81.49184987430333]
We develop d-SNEQ, a differentiable quantisation method for network embedding.
d-SNEQ incorporates a rank loss to equip the learned quantisation codes with rich high-order information.
It is able to substantially compress the size of trained embeddings, thus reducing storage footprint and accelerating retrieval speed.
arXiv Detail & Related papers (2021-08-20T11:53:05Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z) - Accelerating Deep Learning Inference via Learned Caches [11.617579969991294]
Deep Neural Networks (DNNs) are witnessing increased adoption in multiple domains owing to their high accuracy in solving real-world problems.
Current low latency solutions trade-off on accuracy or fail to exploit the inherent temporal locality in prediction serving workloads.
We present the design of GATI, an end-to-end prediction serving system that incorporates learned caches for low-latency inference.
arXiv Detail & Related papers (2021-01-18T22:13:08Z) - Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive
Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning.
We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset.
We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z) - Depth Enables Long-Term Memory for Recurrent Neural Networks [0.0]
We introduce a measure of the network's ability to support information flow across time, referred to as the Start-End separation rank.
We prove that deep recurrent networks support Start-End separation ranks which are higher than those supported by their shallow counterparts.
arXiv Detail & Related papers (2020-03-23T10:29:14Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.