Binary Early-Exit Network for Adaptive Inference on Low-Resource Devices
- URL: http://arxiv.org/abs/2206.09029v1
- Date: Fri, 17 Jun 2022 22:11:11 GMT
- Title: Binary Early-Exit Network for Adaptive Inference on Low-Resource Devices
- Authors: Aaqib Saeed
- Abstract summary: Binary neural networks (BNNs) tackle the issue with extreme compression and speed-up gains compared to real-valued models.
We propose a simple but effective method to accelerate inference through unifying BNNs with an early-exiting strategy.
Our approach allows simple instances to exit early based on a decision threshold and utilizes output layers added to different intermediate layers to avoid executing the entire binary model.
- Score: 3.591566487849146
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks have significantly improved performance on a range of
tasks with the increasing demand for computational resources, leaving
deployment on low-resource devices (with limited memory and battery power)
infeasible. Binary neural networks (BNNs) tackle the issue to an extent with
extreme compression and speed-up gains compared to real-valued models. We
propose a simple but effective method to accelerate inference through unifying
BNNs with an early-exiting strategy. Our approach allows simple instances to
exit early based on a decision threshold and utilizes output layers added to
different intermediate layers to avoid executing the entire binary model. We
extensively evaluate our method on three audio classification tasks and across
four BNNs architectures. Our method demonstrates favorable quality-efficiency
trade-offs while being controllable with an entropy-based threshold specified
by the system user. It also results in better speed-ups (latency less than 6ms)
with a single model based on existing BNN architectures without retraining for
different efficiency levels. It also provides a straightforward way to estimate
sample difficulty and better understanding of uncertainty around certain
classes within the dataset.
Related papers
- Binary domain generalization for sparsifying binary neural networks [3.2462411268263964]
Binary neural networks (BNNs) are an attractive solution for developing and deploying deep neural network (DNN)-based applications in resource constrained devices.
Weight pruning of BNNs leads to performance degradation, which suggests that the standard binarization domain of BNNs is not well adapted for the task.
This work proposes a novel more general binary domain that extends the standard binary one that is more robust to pruning techniques.
arXiv Detail & Related papers (2023-06-23T14:32:16Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors.
Our work is the first attempt to optimize BNNs from the bilinear perspective.
We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z) - A Simple Approach to Improve Single-Model Deep Uncertainty via
Distance-Awareness [33.09831377640498]
We study approaches to improve uncertainty property of a single network, based on a single, deterministic representation.
We propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs.
On a suite of vision and language understanding benchmarks, SNGP outperforms other single-model approaches in prediction, calibration and out-of-domain detection.
arXiv Detail & Related papers (2022-05-01T05:46:13Z) - Spatial-Temporal-Fusion BNN: Variational Bayesian Feature Layer [77.78479877473899]
We design a spatial-temporal-fusion BNN for efficiently scaling BNNs to large models.
Compared to vanilla BNNs, our approach can greatly reduce the training time and the number of parameters, which contributes to scale BNNs efficiently.
arXiv Detail & Related papers (2021-12-12T17:13:14Z) - S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural
Networks via Guided Distribution Calibration [74.5509794733707]
We present a novel guided learning paradigm from real-valued to distill binary networks on the final prediction distribution.
Our proposed method can boost the simple contrastive learning baseline by an absolute gain of 5.515% on BNNs.
Our method achieves substantial improvement over the simple contrastive learning baseline, and is even comparable to many mainstream supervised BNN methods.
arXiv Detail & Related papers (2021-02-17T18:59:28Z) - A Meta-Learning Approach to the Optimal Power Flow Problem Under
Topology Reconfigurations [69.73803123972297]
We propose a DNN-based OPF predictor that is trained using a meta-learning (MTL) approach.
The developed OPF-predictor is validated through simulations using benchmark IEEE bus systems.
arXiv Detail & Related papers (2020-12-21T17:39:51Z) - Encoding the latent posterior of Bayesian Neural Networks for
uncertainty quantification [10.727102755903616]
We aim for efficient deep BNNs amenable to complex computer vision architectures.
We achieve this by leveraging variational autoencoders (VAEs) to learn the interaction and the latent distribution of the parameters at each network layer.
Our approach, Latent-Posterior BNN (LP-BNN), is compatible with the recent BatchEnsemble method, leading to highly efficient (in terms of computation and memory during both training and testing) ensembles.
arXiv Detail & Related papers (2020-12-04T19:50:09Z) - FTBNN: Rethinking Non-linearity for 1-bit CNNs and Going Beyond [23.5996182207431]
We show that binarized convolution process owns an increasing linearity towards the target of minimizing such error, which in turn hampers BNN's discriminative ability.
We re-investigate and tune proper non-linear modules to fix that contradiction, leading to a strong baseline which achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-10-19T08:11:48Z) - Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks [61.76338096980383]
A range of neural architecture search (NAS) techniques are used to automatically learn two types of hyper- parameters of state-of-the-art factored time delay neural networks (TDNNs)
These include the DARTS method integrating architecture selection with lattice-free MMI (LF-MMI) TDNN training.
Experiments conducted on a 300-hour Switchboard corpus suggest the auto-configured systems consistently outperform the baseline LF-MMI TDNN systems.
arXiv Detail & Related papers (2020-07-17T08:32:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.