Related papers: Scaling Up Deep Neural Network Optimization for Edge Inference

Scaling Up Deep Neural Network Optimization for Edge Inference

URL: http://arxiv.org/abs/2009.00278v3
Date: Thu, 17 Sep 2020 07:41:44 GMT
Title: Scaling Up Deep Neural Network Optimization for Edge Inference
Authors: Bingqian Lu, Jianyi Yang, and Shaolei Ren
Abstract summary: Deep neural networks (DNNs) have been increasingly deployed on and integrated with edge devices, such as mobile phones, drones, robots and wearables. To run DNN inference directly on edge devices (a.k.a. edge inference) with a satisfactory performance, optimizing the DNN design is crucial. We propose two approaches to scaling up DNN optimization. In the first approach, we reuse the performance predictors built on a proxy device, and leverage the performance monotonicity to scale up the DNN optimization without re-building performance predictors for each different device. In the second approach, we build scalable performance predictors that can estimate
Score: 20.9711130126031
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks (DNNs) have been increasingly deployed on and integrated with edge devices, such as mobile phones, drones, robots and wearables. To run DNN inference directly on edge devices (a.k.a. edge inference) with a satisfactory performance, optimizing the DNN design (e.g., network architecture and quantization policy) is crucial. While state-of-the-art DNN designs have leveraged performance predictors to speed up the optimization process, they are device-specific (i.e., each predictor for only one target device) and hence cannot scale well in the presence of extremely diverse edge devices. Moreover, even with performance predictors, the optimizer (e.g., search-based optimization) can still be time-consuming when optimizing DNNs for many different devices. In this work, we propose two approaches to scaling up DNN optimization. In the first approach, we reuse the performance predictors built on a proxy device, and leverage the performance monotonicity to scale up the DNN optimization without re-building performance predictors for each different device. In the second approach, we build scalable performance predictors that can estimate the resulting performance (e.g., inference accuracy/latency/energy) given a DNN-device pair, and use a neural network-based automated optimizer that takes both device features and optimization parameters as input and then directly outputs the optimal DNN design without going through a lengthy optimization process for each individual device.

Related papers

PerfSAGE: Generalized Inference Performance Predictor for Arbitrary Deep Learning Models on Edge Devices [8.272409756443539]
This paper describes PerfSAGE, a novel graph neural network (GNN) that predicts inference latency, energy, and memory footprint on an arbitrary DNNlite graph. Using this dataset, we train PerfSAGE and provide experimental results that demonstrate state-of-the-art prediction accuracy with a Mean Absolute Percentage Error of 5% across all targets and model search spaces.
arXiv Detail & Related papers (2023-01-26T08:59:15Z)
Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors. Our work is the first attempt to optimize BNNs from the bilinear perspective. We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z)
Benchmarking Test-Time Unsupervised Deep Neural Network Adaptation on Edge Devices [19.335535517714703]
The prediction accuracy of the deep neural networks (DNNs) after deployment at the edge can suffer with time due to shifts in the distribution of the new data. Recent prediction-time unsupervised DNN adaptation techniques have been introduced that improve prediction accuracy of the models for noisy data by re-tuning the batch normalization parameters. This paper, for the first time, performs a comprehensive measurement study of such techniques to quantify their performance and energy on various edge devices.
arXiv Detail & Related papers (2022-03-21T19:10:40Z)
Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations. We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z)
Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks. specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples. We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z)
Automated Design Space Exploration for optimised Deployment of DNN on Arm Cortex-A CPUs [13.628734116014819]
Deep learning on embedded devices has prompted the development of numerous methods to optimise the deployment of deep neural networks (DNN) There is a lack of research on cross-level optimisation as the space of approaches becomes too large to test and obtain a globally optimised solution. We present a set of results for state-of-the-art DNNs on a range of Arm Cortex-A CPU platforms achieving up to 4x improvement in performance and over 2x reduction in memory.
arXiv Detail & Related papers (2020-06-09T11:00:06Z)
Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training [8.157520622932374]
profiling tools do not aim to answer predictive questions such as "How will optimization X affect the performance of my model?" We propose a new profiling tool, Daydream, to help programmers efficiently explore the efficacy of DNN optimizations. We show that Daydream is able to model most mainstream DNN optimization techniques, and accurately predict the efficacy of optimizations that will result in significant performance improvements.
arXiv Detail & Related papers (2020-06-05T09:08:16Z)
A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration Framework [56.57225686288006]
Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices. Previous pruning methods mainly focus on reducing the model size and/or improving performance without considering the privacy of user data. We propose a privacy-preserving-oriented pruning and mobile acceleration framework that does not require the private training dataset.
arXiv Detail & Related papers (2020-03-13T23:52:03Z)
Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
Self-directed Online Learning Optimization integrates Deep Neural Network (DNN) with Finite Element Method (FEM) calculations. Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization. It reduced the computational time by 2 5 orders of magnitude compared with directly using methods, and outperformed all state-of-the-art algorithms tested in our experiments.
arXiv Detail & Related papers (2020-02-04T20:00:28Z)
PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space. With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.