Related papers: Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

URL: http://arxiv.org/abs/2204.11786v1
Date: Mon, 25 Apr 2022 16:52:48 GMT
Title: Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
Authors: Han Cai, Ji Lin, Yujun Lin, Zhijian Liu, Haotian Tang, Hanrui Wang, Ligeng Zhu, Song Han
Abstract summary: Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI) However, their superior performance comes at the considerable cost of computational complexity. This paper provides an overview of efficient deep learning methods, systems and applications.
Score: 46.97774949613859
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing and speech recognition. However, their superior performance comes at the considerable cost of computational complexity, which greatly hinders their applications in many resource-constrained devices, such as mobile phones and Internet of Things (IoT) devices. Therefore, methods and techniques that are able to lift the efficiency bottleneck while preserving the high accuracy of DNNs are in great demand in order to enable numerous edge AI applications. This paper provides an overview of efficient deep learning methods, systems and applications. We start from introducing popular model compression methods, including pruning, factorization, quantization as well as compact model design. To reduce the large design cost of these manual solutions, we discuss the AutoML framework for each of them, such as neural architecture search (NAS) and automated pruning and quantization. We then cover efficient on-device training to enable user customization based on the local data on mobile devices. Apart from general acceleration techniques, we also showcase several task-specific accelerations for point cloud, video and natural language processing by exploiting their spatial sparsity and temporal/token redundancy. Finally, to support all these algorithmic advancements, we introduce the efficient deep learning system design from both software and hardware perspectives.

Related papers

Machine Learning Insides OptVerse AI Solver: Design Principles and Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver. We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem. We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z)
Random resistive memory-based deep extreme point learning machine for unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM) Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z)
Design Automation for Fast, Lightweight, and Effective Deep Learning Models: A Survey [53.258091735278875]
This survey covers studies of design automation techniques for deep learning models targeting edge computing. It offers an overview and comparison of key metrics that are used commonly to quantify the proficiency of models in terms of effectiveness, lightness, and computational costs. The survey proceeds to cover three categories of the state-of-the-art of deep model design automation techniques.
arXiv Detail & Related papers (2022-08-22T12:12:43Z)
Efficient Machine Learning, Compilers, and Optimizations for Embedded Systems [21.098443474303462]
Deep Neural Networks (DNNs) have achieved great success in a massive number of artificial intelligence (AI) applications by delivering high-quality computer vision, natural language processing, and virtual reality applications. These emerging AI applications also come with increasing computation and memory demands, which are challenging to handle especially for the embedded systems where limited/memory resources, tight power budgets, and small form factors are demanded. This book chapter introduces a series of effective design methods to enable efficient algorithms, compilers, and various optimizations for embedded systems.
arXiv Detail & Related papers (2022-06-06T02:54:05Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
Smart at what cost? Characterising Mobile Deep Neural Networks in the wild [16.684419342012674]
This paper is the first holistic study of Deep Neural Network (DNN) usage in the wild. We analyse over 16k of the most popular apps in the Google Play Store. We measure the models' energy footprint, as a core cost dimension of any mobile deployment.
arXiv Detail & Related papers (2021-09-28T18:09:29Z)
How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures [7.085772863979686]
Deep neural networks (DNNs) have led to large strides in various Artificial Intelligence (AI) inference tasks, such as object and speech recognition. deploying such AI models across commodity devices faces significant challenges. We present techniques for achieving real-time performance following a cross-stack approach.
arXiv Detail & Related papers (2021-06-21T11:23:12Z)
Pervasive AI for IoT Applications: Resource-efficient Distributed Artificial Intelligence [45.076180487387575]
Artificial intelligence (AI) has witnessed a substantial breakthrough in a variety of Internet of Things (IoT) applications and services. This is driven by the easier access to sensory data and the enormous scale of pervasive/ubiquitous devices that generate zettabytes (ZB) of real-time data streams. The confluence of pervasive computing and artificial intelligence, Pervasive AI, expanded the role of ubiquitous IoT systems.
arXiv Detail & Related papers (2021-05-04T23:42:06Z)
MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS) We employ a one-shot architecture search approach in order to obtain a reduced search cost. We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.