Adaptive Deep Neural Network Inference Optimization with EENet
- URL: http://arxiv.org/abs/2301.07099v2
- Date: Fri, 1 Dec 2023 17:12:35 GMT
- Title: Adaptive Deep Neural Network Inference Optimization with EENet
- Authors: Fatih Ilhan, Ka-Ho Chow, Sihao Hu, Tiansheng Huang, Selim Tekin, Wenqi
Wei, Yanzhao Wu, Myungjin Lee, Ramana Kompella, Hugo Latapie, Gaowen Liu,
Ling Liu
- Abstract summary: Well-trained deep neural networks (DNNs) treat all test samples equally during prediction.
This paper presents EENet, a novel early-exiting scheduling framework for multi-exit DNN models.
- Score: 18.816078515565707
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Well-trained deep neural networks (DNNs) treat all test samples equally
during prediction. Adaptive DNN inference with early exiting leverages the
observation that some test examples can be easier to predict than others. This
paper presents EENet, a novel early-exiting scheduling framework for multi-exit
DNN models. Instead of having every sample go through all DNN layers during
prediction, EENet learns an early exit scheduler, which can intelligently
terminate the inference earlier for certain predictions, which the model has
high confidence of early exit. As opposed to previous early-exiting solutions
with heuristics-based methods, our EENet framework optimizes an early-exiting
policy to maximize model accuracy while satisfying the given per-sample average
inference budget. Extensive experiments are conducted on four computer vision
datasets (CIFAR-10, CIFAR-100, ImageNet, Cityscapes) and two NLP datasets
(SST-2, AgNews). The results demonstrate that the adaptive inference by EENet
can outperform the representative existing early exit techniques. We also
perform a detailed visualization analysis of the comparison results to
interpret the benefits of EENet.
Related papers
- DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models [55.608981341747246]
We introduce Data Adaptive Self-Supervised Early Exit (DAISY), an approach that decides when to exit based on the self-supervised loss.
Our analysis on the adaptivity of DAISY shows that the model exits early (using fewer layers) on clean data while exits late (using more layers) on noisy data.
arXiv Detail & Related papers (2024-06-08T12:58:13Z) - Anole: Adapting Diverse Compressed Models For Cross-Scene Prediction On Mobile Devices [17.542012577533015]
Anole is a light-weight scheme to cope with the local DNN model inference on mobile devices.
We implement Anole on different types of mobile devices and conduct extensive trace-driven and real-world experiments based on unmanned aerial vehicles (UAVs)
arXiv Detail & Related papers (2024-05-09T12:06:18Z) - CDMPP: A Device-Model Agnostic Framework for Latency Prediction of
Tensor Programs [11.025071880642974]
Deep Neural Networks (DNNs) have shown excellent performance in a wide range of machine learning applications.
Knowing the latency of running a DNN model or tensor program on a specific device is useful in various tasks.
We propose CDMPP, an efficient tensor program latency prediction framework for both cross-model and cross-device prediction.
arXiv Detail & Related papers (2023-11-16T09:05:52Z) - Early-Exit Neural Networks with Nested Prediction Sets [26.618810100134862]
Early-exit neural networks (EENNs) enable adaptive and efficient inference by providing predictions at multiple stages during the forward pass.
Standard Bayesian techniques such as conformal prediction and credible sets are not suitable for EENNs.
We investigate anytime-valid confidence sequences (AVCSs)
These sequences are inherently nested and thus well-suited for an EENN's sequential predictions.
arXiv Detail & Related papers (2023-11-10T08:38:18Z) - Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs.
We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD.
We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z) - PerfSAGE: Generalized Inference Performance Predictor for Arbitrary Deep
Learning Models on Edge Devices [8.272409756443539]
This paper describes PerfSAGE, a novel graph neural network (GNN) that predicts inference latency, energy, and memory footprint on an arbitrary DNNlite graph.
Using this dataset, we train PerfSAGE and provide experimental results that demonstrate state-of-the-art prediction accuracy with a Mean Absolute Percentage Error of 5% across all targets and model search spaces.
arXiv Detail & Related papers (2023-01-26T08:59:15Z) - Boosted Dynamic Neural Networks [53.559833501288146]
A typical EDNN has multiple prediction heads at different layers of the network backbone.
To optimize the model, these prediction heads together with the network backbone are trained on every batch of training data.
Treating training and testing inputs differently at the two phases will cause the mismatch between training and testing data distributions.
We formulate an EDNN as an additive model inspired by gradient boosting, and propose multiple training techniques to optimize the model effectively.
arXiv Detail & Related papers (2022-11-30T04:23:12Z) - Unsupervised Early Exit in DNNs with Multiple Exits [0.0]
We focus on Elastic BERT, a pre-trained multi-exit DNN to demonstrate that it nearly' satisfies the Strong Dominance (SD) property.
We empirically validate our algorithm on IMDb and Yelp datasets.
arXiv Detail & Related papers (2022-09-20T05:35:54Z) - Learning Reasoning Strategies in End-to-End Differentiable Proving [50.9791149533921]
Conditional Theorem Provers learn optimal rule selection strategy via gradient-based optimisation.
We show that Conditional Theorem Provers are scalable and yield state-of-the-art results on the CLUTRR dataset.
arXiv Detail & Related papers (2020-07-13T16:22:14Z) - Accuracy Prediction with Non-neural Model for Neural Architecture Search [185.0651567642238]
We study an alternative approach which uses non-neural model for accuracy prediction.
We leverage gradient boosting decision tree (GBDT) as the predictor for Neural architecture search (NAS)
Experiments on NASBench-101 and ImageNet demonstrate the effectiveness of using GBDT as predictor for NAS.
arXiv Detail & Related papers (2020-07-09T13:28:49Z) - ProphetNet: Predicting Future N-gram for Sequence-to-Sequence
Pre-training [85.35910219651572]
We present a new sequence-to-sequence pre-training model called ProphetNet.
It introduces a novel self-supervised objective named future n-gram prediction.
We conduct experiments on CNN/DailyMail, Gigaword, and SQuAD 1.1 benchmarks for abstractive summarization and question generation tasks.
arXiv Detail & Related papers (2020-01-13T05:12:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.