Scheduling Real-time Deep Learning Services as Imprecise Computations
- URL: http://arxiv.org/abs/2011.01112v1
- Date: Mon, 2 Nov 2020 16:43:04 GMT
- Title: Scheduling Real-time Deep Learning Services as Imprecise Computations
- Authors: Shuochao Yao, Yifan Hao, Yiran Zhao, Huajie Shao, Dongxin Liu,
Shengzhong Liu, Tianshi Wang, Jinyang Li, Tarek Abdelzaher
- Abstract summary: The paper presents an efficient real-time scheduling algorithm for intelligent real-time edge services.
These services perform machine intelligence tasks, such as voice recognition, LIDAR processing, or machine vision.
We show that deep neural network can be cast as imprecise computations, each with a mandatory part and several optional parts.
- Score: 11.611969843191433
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The paper presents an efficient real-time scheduling algorithm for
intelligent real-time edge services, defined as those that perform machine
intelligence tasks, such as voice recognition, LIDAR processing, or machine
vision, on behalf of local embedded devices that are themselves unable to
support extensive computations. The work contributes to a recent direction in
real-time computing that develops scheduling algorithms for machine
intelligence tasks with anytime prediction. We show that deep neural network
workflows can be cast as imprecise computations, each with a mandatory part and
(several) optional parts whose execution utility depends on input data. The
goal of the real-time scheduler is to maximize the average accuracy of deep
neural network outputs while meeting task deadlines, thanks to opportunistic
shedding of the least necessary optional parts. The work is motivated by the
proliferation of increasingly ubiquitous but resource-constrained embedded
devices (for applications ranging from autonomous cars to the Internet of
Things) and the desire to develop services that endow them with intelligence.
Experiments on recent GPU hardware and a state of the art deep neural network
for machine vision illustrate that our scheme can increase the overall accuracy
by 10%-20% while incurring (nearly) no deadline misses.
Related papers
- Neural Network Methods for Radiation Detectors and Imaging [1.6395318070400589]
Recent advances in machine learning and especially deep neural networks (DNNs) allow for new optimization and performance-enhancement schemes for radiation detectors and imaging hardware.
We give an overview of data generation at photon sources, deep learning-based methods for image processing tasks, and hardware solutions for deep learning acceleration.
arXiv Detail & Related papers (2023-11-09T20:21:51Z) - Temporal Patience: Efficient Adaptive Deep Learning for Embedded Radar
Data Processing [4.359030177348051]
This paper presents novel techniques that leverage the temporal correlation present in streaming radar data to enhance the efficiency of Early Exit Neural Networks for Deep Learning inference on embedded devices.
Our results demonstrate that our techniques save up to 26% of operations per inference over a Single Exit Network and 12% over a confidence-based Early Exit version.
Such efficiency gains enable real-time radar data processing on resource-constrained platforms, allowing for new applications in the context of smart homes, Internet-of-Things, and human-computer interaction.
arXiv Detail & Related papers (2023-09-11T12:38:01Z) - Machine Learning aided Computer Architecture Design for CNN Inferencing
Systems [0.0]
We develop a technique for forecasting the power and performance of CNNs during inference, with a MAPE of 5.03% and 5.94%, respectively.
Our approach empowers computer architects to estimate power and performance in the early stages of development, reducing the necessity for numerous prototypes.
arXiv Detail & Related papers (2023-08-10T06:17:46Z) - Learnability with Time-Sharing Computational Resource Concerns [65.268245109828]
We present a theoretical framework that takes into account the influence of computational resources in learning theory.
This framework can be naturally applied to stream learning where the incoming data streams can be potentially endless.
It may also provide a theoretical perspective for the design of intelligent supercomputing operating systems.
arXiv Detail & Related papers (2023-05-03T15:54:23Z) - HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler
for Neural Networks [51.71682428015139]
We propose HARL, a reinforcement learning-based auto-scheduler for efficient tensor program exploration.
HarL improves the tensor operator performance by 22% and the search speed by 4.3x compared to the state-of-the-art auto-scheduler.
Inference performance and search speed are also significantly improved on end-to-end neural networks.
arXiv Detail & Related papers (2022-11-21T04:15:27Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Enable Deep Learning on Mobile Devices: Methods, Systems, and
Applications [46.97774949613859]
Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI)
However, their superior performance comes at the considerable cost of computational complexity.
This paper provides an overview of efficient deep learning methods, systems and applications.
arXiv Detail & Related papers (2022-04-25T16:52:48Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - Collaborative Learning over Wireless Networks: An Introductory Overview [84.09366153693361]
We will mainly focus on collaborative training across wireless devices.
Many distributed optimization algorithms have been developed over the last decades.
They provide data locality; that is, a joint model can be trained collaboratively while the data available at each participating device remains local.
arXiv Detail & Related papers (2021-12-07T20:15:39Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.