Enable Deep Learning on Mobile Devices: Methods, Systems, and
Applications
- URL: http://arxiv.org/abs/2204.11786v1
- Date: Mon, 25 Apr 2022 16:52:48 GMT
- Title: Enable Deep Learning on Mobile Devices: Methods, Systems, and
Applications
- Authors: Han Cai, Ji Lin, Yujun Lin, Zhijian Liu, Haotian Tang, Hanrui Wang,
Ligeng Zhu, Song Han
- Abstract summary: Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI)
However, their superior performance comes at the considerable cost of computational complexity.
This paper provides an overview of efficient deep learning methods, systems and applications.
- Score: 46.97774949613859
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) have achieved unprecedented success in the field
of artificial intelligence (AI), including computer vision, natural language
processing and speech recognition. However, their superior performance comes at
the considerable cost of computational complexity, which greatly hinders their
applications in many resource-constrained devices, such as mobile phones and
Internet of Things (IoT) devices. Therefore, methods and techniques that are
able to lift the efficiency bottleneck while preserving the high accuracy of
DNNs are in great demand in order to enable numerous edge AI applications. This
paper provides an overview of efficient deep learning methods, systems and
applications. We start from introducing popular model compression methods,
including pruning, factorization, quantization as well as compact model design.
To reduce the large design cost of these manual solutions, we discuss the
AutoML framework for each of them, such as neural architecture search (NAS) and
automated pruning and quantization. We then cover efficient on-device training
to enable user customization based on the local data on mobile devices. Apart
from general acceleration techniques, we also showcase several task-specific
accelerations for point cloud, video and natural language processing by
exploiting their spatial sparsity and temporal/token redundancy. Finally, to
support all these algorithmic advancements, we introduce the efficient deep
learning system design from both software and hardware perspectives.
Related papers
- Machine Learning Insides OptVerse AI Solver: Design Principles and
Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver.
We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem.
We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z) - Random resistive memory-based deep extreme point learning machine for
unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM)
Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z) - Design Automation for Fast, Lightweight, and Effective Deep Learning
Models: A Survey [53.258091735278875]
This survey covers studies of design automation techniques for deep learning models targeting edge computing.
It offers an overview and comparison of key metrics that are used commonly to quantify the proficiency of models in terms of effectiveness, lightness, and computational costs.
The survey proceeds to cover three categories of the state-of-the-art of deep model design automation techniques.
arXiv Detail & Related papers (2022-08-22T12:12:43Z) - Efficient Machine Learning, Compilers, and Optimizations for Embedded
Systems [21.098443474303462]
Deep Neural Networks (DNNs) have achieved great success in a massive number of artificial intelligence (AI) applications by delivering high-quality computer vision, natural language processing, and virtual reality applications.
These emerging AI applications also come with increasing computation and memory demands, which are challenging to handle especially for the embedded systems where limited/memory resources, tight power budgets, and small form factors are demanded.
This book chapter introduces a series of effective design methods to enable efficient algorithms, compilers, and various optimizations for embedded systems.
arXiv Detail & Related papers (2022-06-06T02:54:05Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - Smart at what cost? Characterising Mobile Deep Neural Networks in the
wild [16.684419342012674]
This paper is the first holistic study of Deep Neural Network (DNN) usage in the wild.
We analyse over 16k of the most popular apps in the Google Play Store.
We measure the models' energy footprint, as a core cost dimension of any mobile deployment.
arXiv Detail & Related papers (2021-09-28T18:09:29Z) - How to Reach Real-Time AI on Consumer Devices? Solutions for
Programmable and Custom Architectures [7.085772863979686]
Deep neural networks (DNNs) have led to large strides in various Artificial Intelligence (AI) inference tasks, such as object and speech recognition.
deploying such AI models across commodity devices faces significant challenges.
We present techniques for achieving real-time performance following a cross-stack approach.
arXiv Detail & Related papers (2021-06-21T11:23:12Z) - Pervasive AI for IoT Applications: Resource-efficient Distributed
Artificial Intelligence [45.076180487387575]
Artificial intelligence (AI) has witnessed a substantial breakthrough in a variety of Internet of Things (IoT) applications and services.
This is driven by the easier access to sensory data and the enormous scale of pervasive/ubiquitous devices that generate zettabytes (ZB) of real-time data streams.
The confluence of pervasive computing and artificial intelligence, Pervasive AI, expanded the role of ubiquitous IoT systems.
arXiv Detail & Related papers (2021-05-04T23:42:06Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.