How to Reach Real-Time AI on Consumer Devices? Solutions for
Programmable and Custom Architectures
- URL: http://arxiv.org/abs/2106.15021v1
- Date: Mon, 21 Jun 2021 11:23:12 GMT
- Title: How to Reach Real-Time AI on Consumer Devices? Solutions for
Programmable and Custom Architectures
- Authors: Stylianos I. Venieris and Ioannis Panopoulos and Ilias Leontiadis and
Iakovos S. Venieris
- Abstract summary: Deep neural networks (DNNs) have led to large strides in various Artificial Intelligence (AI) inference tasks, such as object and speech recognition.
deploying such AI models across commodity devices faces significant challenges.
We present techniques for achieving real-time performance following a cross-stack approach.
- Score: 7.085772863979686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The unprecedented performance of deep neural networks (DNNs) has led to large
strides in various Artificial Intelligence (AI) inference tasks, such as object
and speech recognition. Nevertheless, deploying such AI models across commodity
devices faces significant challenges: large computational cost, multiple
performance objectives, hardware heterogeneity and a common need for high
accuracy, together pose critical problems to the deployment of DNNs across the
various embedded and mobile devices in the wild. As such, we have yet to
witness the mainstream usage of state-of-the-art deep learning algorithms
across consumer devices. In this paper, we provide preliminary answers to this
potentially game-changing question by presenting an array of design techniques
for efficient AI systems. We start by examining the major roadblocks when
targeting both programmable processors and custom accelerators. Then, we
present diverse methods for achieving real-time performance following a
cross-stack approach. These span model-, system- and hardware-level techniques,
and their combination. Our findings provide illustrative examples of AI systems
that do not overburden mobile hardware, while also indicating how they can
improve inference accuracy. Moreover, we showcase how custom ASIC- and
FPGA-based accelerators can be an enabling factor for next-generation AI
applications, such as multi-DNN systems. Collectively, these results highlight
the critical need for further exploration as to how the various cross-stack
solutions can be best combined in order to bring the latest advances in deep
learning close to users, in a robust and efficient manner.
Related papers
- Inference Optimization of Foundation Models on AI Accelerators [68.24450520773688]
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI.
As the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios.
This tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators.
arXiv Detail & Related papers (2024-07-12T09:24:34Z) - Uncertainty Estimation in Multi-Agent Distributed Learning for AI-Enabled Edge Devices [0.0]
Edge IoT devices have seen a paradigm shift with the introduction of FPGAs and AI accelerators.
This advancement has vastly amplified their computational capabilities, emphasizing the practicality of edge AI.
Our study explores methods that enable distributed data processing through AI-enabled edge devices, enhancing collaborative learning capabilities.
arXiv Detail & Related papers (2024-03-14T07:40:32Z) - Machine Learning Insides OptVerse AI Solver: Design Principles and
Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver.
We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem.
We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z) - Edge AI Inference in Heterogeneous Constrained Computing: Feasibility
and Opportunities [9.156192191794567]
The proliferation of AI inference accelerators showcases innovation but also underscores challenges.
This paper outlines the requirements and components of a framework that accommodates hardware diversity.
Next, we assess the impact of device heterogeneity on AI inference performance, identifying strategies to optimize outcomes without compromising service quality.
arXiv Detail & Related papers (2023-10-27T16:46:59Z) - Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural
Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone.
This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge.
We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z) - Enable Deep Learning on Mobile Devices: Methods, Systems, and
Applications [46.97774949613859]
Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI)
However, their superior performance comes at the considerable cost of computational complexity.
This paper provides an overview of efficient deep learning methods, systems and applications.
arXiv Detail & Related papers (2022-04-25T16:52:48Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - Pervasive AI for IoT Applications: Resource-efficient Distributed
Artificial Intelligence [45.076180487387575]
Artificial intelligence (AI) has witnessed a substantial breakthrough in a variety of Internet of Things (IoT) applications and services.
This is driven by the easier access to sensory data and the enormous scale of pervasive/ubiquitous devices that generate zettabytes (ZB) of real-time data streams.
The confluence of pervasive computing and artificial intelligence, Pervasive AI, expanded the role of ubiquitous IoT systems.
arXiv Detail & Related papers (2021-05-04T23:42:06Z) - Towards AIOps in Edge Computing Environments [60.27785717687999]
This paper describes the system design of an AIOps platform which is applicable in heterogeneous, distributed environments.
It is feasible to collect metrics with a high frequency and simultaneously run specific anomaly detection algorithms directly on edge devices.
arXiv Detail & Related papers (2021-02-12T09:33:00Z) - Communication-Efficient Edge AI: Algorithms and Systems [39.28788394839187]
Wide scale deployment of edge devices (e.g., IoT devices) generates an unprecedented scale of data.
Such enormous data cannot all be sent from end devices to the cloud for processing.
By pushing inference and training processes of AI models to edge nodes, edge AI has emerged as a promising alternative.
arXiv Detail & Related papers (2020-02-22T09:27:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.