Efficient Machine Learning, Compilers, and Optimizations for Embedded
Systems
- URL: http://arxiv.org/abs/2206.03326v1
- Date: Mon, 6 Jun 2022 02:54:05 GMT
- Title: Efficient Machine Learning, Compilers, and Optimizations for Embedded
Systems
- Authors: Xiaofan Zhang, Yao Chen, Cong Hao, Sitao Huang, Yuhong Li, Deming Chen
- Abstract summary: Deep Neural Networks (DNNs) have achieved great success in a massive number of artificial intelligence (AI) applications by delivering high-quality computer vision, natural language processing, and virtual reality applications.
These emerging AI applications also come with increasing computation and memory demands, which are challenging to handle especially for the embedded systems where limited/memory resources, tight power budgets, and small form factors are demanded.
This book chapter introduces a series of effective design methods to enable efficient algorithms, compilers, and various optimizations for embedded systems.
- Score: 21.098443474303462
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Deep Neural Networks (DNNs) have achieved great success in a massive number
of artificial intelligence (AI) applications by delivering high-quality
computer vision, natural language processing, and virtual reality applications.
However, these emerging AI applications also come with increasing computation
and memory demands, which are challenging to handle especially for the embedded
systems where limited computation/memory resources, tight power budgets, and
small form factors are demanded. Challenges also come from the diverse
application-specific requirements, including real-time responses,
high-throughput performance, and reliable inference accuracy. To address these
challenges, we will introduce a series of effective design methods in this book
chapter to enable efficient algorithms, compilers, and various optimizations
for embedded systems.
Related papers
- Using the Abstract Computer Architecture Description Language to Model
AI Hardware Accelerators [77.89070422157178]
Manufacturers of AI-integrated products face a critical challenge: selecting an accelerator that aligns with their product's performance requirements.
The Abstract Computer Architecture Description Language (ACADL) is a concise formalization of computer architecture block diagrams.
In this paper, we demonstrate how to use the ACADL to model AI hardware accelerators, use their ACADL description to map DNNs onto them, and explain the timing simulation semantics to gather performance results.
arXiv Detail & Related papers (2024-01-30T19:27:16Z) - Machine Learning Insides OptVerse AI Solver: Design Principles and
Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver.
We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem.
We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z) - Edge AI Inference in Heterogeneous Constrained Computing: Feasibility
and Opportunities [9.156192191794567]
The proliferation of AI inference accelerators showcases innovation but also underscores challenges.
This paper outlines the requirements and components of a framework that accommodates hardware diversity.
Next, we assess the impact of device heterogeneity on AI inference performance, identifying strategies to optimize outcomes without compromising service quality.
arXiv Detail & Related papers (2023-10-27T16:46:59Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Enable Deep Learning on Mobile Devices: Methods, Systems, and
Applications [46.97774949613859]
Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI)
However, their superior performance comes at the considerable cost of computational complexity.
This paper provides an overview of efficient deep learning methods, systems and applications.
arXiv Detail & Related papers (2022-04-25T16:52:48Z) - How to Reach Real-Time AI on Consumer Devices? Solutions for
Programmable and Custom Architectures [7.085772863979686]
Deep neural networks (DNNs) have led to large strides in various Artificial Intelligence (AI) inference tasks, such as object and speech recognition.
deploying such AI models across commodity devices faces significant challenges.
We present techniques for achieving real-time performance following a cross-stack approach.
arXiv Detail & Related papers (2021-06-21T11:23:12Z) - Reconfigurable Intelligent Surface Assisted Mobile Edge Computing with
Heterogeneous Learning Tasks [53.1636151439562]
Mobile edge computing (MEC) provides a natural platform for AI applications.
We present an infrastructure to perform machine learning tasks at an MEC with the assistance of a reconfigurable intelligent surface (RIS)
Specifically, we minimize the learning error of all participating users by jointly optimizing transmit power of mobile users, beamforming vectors of the base station, and the phase-shift matrix of the RIS.
arXiv Detail & Related papers (2020-12-25T07:08:50Z) - Hardware and Software Optimizations for Accelerating Deep Neural
Networks: Survey of Current Trends, Challenges, and the Road Ahead [14.313423044185583]
This paper introduces the key properties of two brain-inspired models like Deep Neural Network (DNN), and then analyzes techniques to produce efficient and high-performance designs.
A single inference of a DL model may require billions of multiply-and-accumulated operations, making the DL extremely compute- and energy-hungry.
arXiv Detail & Related papers (2020-12-21T10:27:48Z) - Hard-ODT: Hardware-Friendly Online Decision Tree Learning Algorithm and
System [17.55491405857204]
In the era of big data, traditional decision tree induction algorithms are not suitable for learning large-scale datasets.
We introduce a new quantile-based algorithm to improve the induction of the Hoeffding tree, one of the state-of-the-art online learning models.
We present Hard-ODT, a high-performance, hardware-efficient and scalable online decision tree learning system on a field-programmable gate array (FPGA) with system-level optimization techniques.
arXiv Detail & Related papers (2020-12-11T12:06:44Z) - Spiking Neural Networks Hardware Implementations and Challenges: a
Survey [53.429871539789445]
Spiking Neural Networks are cognitive algorithms mimicking neuron and synapse operational principles.
We present the state of the art of hardware implementations of spiking neural networks.
We discuss the strategies employed to leverage the characteristics of these event-driven algorithms at the hardware level.
arXiv Detail & Related papers (2020-05-04T13:24:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.