Related papers: How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures

How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures

URL: http://arxiv.org/abs/2106.15021v1
Date: Mon, 21 Jun 2021 11:23:12 GMT
Title: How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures
Authors: Stylianos I. Venieris and Ioannis Panopoulos and Ilias Leontiadis and Iakovos S. Venieris
Abstract summary: Deep neural networks (DNNs) have led to large strides in various Artificial Intelligence (AI) inference tasks, such as object and speech recognition. deploying such AI models across commodity devices faces significant challenges. We present techniques for achieving real-time performance following a cross-stack approach.
Score: 7.085772863979686
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The unprecedented performance of deep neural networks (DNNs) has led to large strides in various Artificial Intelligence (AI) inference tasks, such as object and speech recognition. Nevertheless, deploying such AI models across commodity devices faces significant challenges: large computational cost, multiple performance objectives, hardware heterogeneity and a common need for high accuracy, together pose critical problems to the deployment of DNNs across the various embedded and mobile devices in the wild. As such, we have yet to witness the mainstream usage of state-of-the-art deep learning algorithms across consumer devices. In this paper, we provide preliminary answers to this potentially game-changing question by presenting an array of design techniques for efficient AI systems. We start by examining the major roadblocks when targeting both programmable processors and custom accelerators. Then, we present diverse methods for achieving real-time performance following a cross-stack approach. These span model-, system- and hardware-level techniques, and their combination. Our findings provide illustrative examples of AI systems that do not overburden mobile hardware, while also indicating how they can improve inference accuracy. Moreover, we showcase how custom ASIC- and FPGA-based accelerators can be an enabling factor for next-generation AI applications, such as multi-DNN systems. Collectively, these results highlight the critical need for further exploration as to how the various cross-stack solutions can be best combined in order to bring the latest advances in deep learning close to users, in a robust and efficient manner.

Related papers

INSIGHT: A Survey of In-Network Systems for Intelligent, High-Efficiency AI and Topology Optimization [43.37351326629751]
In-network AI is a transformative approach to addressing the escalating demands of Artificial Intelligence (AI) on network infrastructure.<n>This paper provides a comprehensive analysis of optimizing in-network computation for AI.<n>It examines methodologies for mapping AI models onto resource-constrained network devices, addressing challenges like limited memory and computational capabilities.
arXiv Detail & Related papers (2025-05-30T06:47:55Z)
Inference Optimization of Foundation Models on AI Accelerators [68.24450520773688]
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI. As the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios. This tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators.
arXiv Detail & Related papers (2024-07-12T09:24:34Z)
Uncertainty Estimation in Multi-Agent Distributed Learning for AI-Enabled Edge Devices [0.0]
Edge IoT devices have seen a paradigm shift with the introduction of FPGAs and AI accelerators. This advancement has vastly amplified their computational capabilities, emphasizing the practicality of edge AI. Our study explores methods that enable distributed data processing through AI-enabled edge devices, enhancing collaborative learning capabilities.
arXiv Detail & Related papers (2024-03-14T07:40:32Z)
Machine Learning Insides OptVerse AI Solver: Design Principles and Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver. We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem. We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z)
Edge AI Inference in Heterogeneous Constrained Computing: Feasibility and Opportunities [9.156192191794567]
The proliferation of AI inference accelerators showcases innovation but also underscores challenges. This paper outlines the requirements and components of a framework that accommodates hardware diversity. Next, we assess the impact of device heterogeneity on AI inference performance, identifying strategies to optimize outcomes without compromising service quality.
arXiv Detail & Related papers (2023-10-27T16:46:59Z)
Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone. This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge. We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z)
Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications [46.97774949613859]
Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI) However, their superior performance comes at the considerable cost of computational complexity. This paper provides an overview of efficient deep learning methods, systems and applications.
arXiv Detail & Related papers (2022-04-25T16:52:48Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
Pervasive AI for IoT Applications: Resource-efficient Distributed Artificial Intelligence [45.076180487387575]
Artificial intelligence (AI) has witnessed a substantial breakthrough in a variety of Internet of Things (IoT) applications and services. This is driven by the easier access to sensory data and the enormous scale of pervasive/ubiquitous devices that generate zettabytes (ZB) of real-time data streams. The confluence of pervasive computing and artificial intelligence, Pervasive AI, expanded the role of ubiquitous IoT systems.
arXiv Detail & Related papers (2021-05-04T23:42:06Z)
Towards AIOps in Edge Computing Environments [60.27785717687999]
This paper describes the system design of an AIOps platform which is applicable in heterogeneous, distributed environments. It is feasible to collect metrics with a high frequency and simultaneously run specific anomaly detection algorithms directly on edge devices.
arXiv Detail & Related papers (2021-02-12T09:33:00Z)
Communication-Efficient Edge AI: Algorithms and Systems [39.28788394839187]
Wide scale deployment of edge devices (e.g., IoT devices) generates an unprecedented scale of data. Such enormous data cannot all be sent from end devices to the cloud for processing. By pushing inference and training processes of AI models to edge nodes, edge AI has emerged as a promising alternative.
arXiv Detail & Related papers (2020-02-22T09:27:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.