Related papers: Efficient Deep Learning Infrastructures for Embedded Computing Systems: A Comprehensive Survey and Future Envision

Efficient Deep Learning Infrastructures for Embedded Computing Systems: A Comprehensive Survey and Future Envision

URL: http://arxiv.org/abs/2411.01431v1
Date: Sun, 03 Nov 2024 03:55:04 GMT
Title: Efficient Deep Learning Infrastructures for Embedded Computing Systems: A Comprehensive Survey and Future Envision
Authors: Xiangzhong Luo, Di Liu, Hao Kong, Shuo Huai, Hui Chen, Guochu Xiong, Weichen Liu,
Abstract summary: Deep neural networks (DNNs) have recently achieved impressive success across a wide range of real-world vision and language processing tasks. Previous well-established DNNs, despite being able to maintain superior accuracy, have also been evolving to be deeper and wider. This survey focuses on discussing recent efficient deep learning infrastructures for embedded computing systems.
Score: 10.533474972061851
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep neural networks (DNNs) have recently achieved impressive success across a wide range of real-world vision and language processing tasks, spanning from image classification to many other downstream vision tasks, such as object detection, tracking, and segmentation. However, previous well-established DNNs, despite being able to maintain superior accuracy, have also been evolving to be deeper and wider and thus inevitably necessitate prohibitive computational resources for both training and inference. This trend further enlarges the computational gap between computation-intensive DNNs and resource-constrained embedded computing systems, making it challenging to deploy powerful DNNs upon real-world embedded computing systems towards ubiquitous embedded intelligence. To alleviate the above computational gap and enable ubiquitous embedded intelligence, we, in this survey, focus on discussing recent efficient deep learning infrastructures for embedded computing systems, spanning from training to inference, from manual to automated, from convolutional neural networks to transformers, from transformers to vision transformers, from vision models to large language models, from software to hardware, and from algorithms to applications. Specifically, we discuss recent efficient deep learning infrastructures for embedded computing systems from the lens of (1) efficient manual network design for embedded computing systems, (2) efficient automated network design for embedded computing systems, (3) efficient network compression for embedded computing systems, (4) efficient on-device learning for embedded computing systems, (5) efficient large language models for embedded computing systems, (6) efficient deep learning software and hardware for embedded computing systems, and (7) efficient intelligent applications for embedded computing systems.

Related papers

NNTile: a machine learning framework capable of training extremely large GPT language models on a single node [83.9328245724548]
NNTile is based on a StarPU library, which implements task-based parallelism and schedules all provided tasks onto all available processing units. It means that a particular operation, necessary to train a large neural network, can be performed on any of the CPU cores or GPU devices.
arXiv Detail & Related papers (2025-04-17T16:22:32Z)
Computation-efficient Deep Learning for Computer Vision: A Survey [121.84121397440337]
Deep learning models have reached or even exceeded human-level performance in a range of visual perception tasks. Deep learning models usually demand significant computational resources, leading to impractical power consumption, latency, or carbon emissions in real-world scenarios. New research focus is computationally efficient deep learning, which strives to achieve satisfactory performance while minimizing the computational cost during inference.
arXiv Detail & Related papers (2023-08-27T03:55:28Z)
Efficient Machine Learning, Compilers, and Optimizations for Embedded Systems [21.098443474303462]
Deep Neural Networks (DNNs) have achieved great success in a massive number of artificial intelligence (AI) applications by delivering high-quality computer vision, natural language processing, and virtual reality applications. These emerging AI applications also come with increasing computation and memory demands, which are challenging to handle especially for the embedded systems where limited/memory resources, tight power budgets, and small form factors are demanded. This book chapter introduces a series of effective design methods to enable efficient algorithms, compilers, and various optimizations for embedded systems.
arXiv Detail & Related papers (2022-06-06T02:54:05Z)
Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications [46.97774949613859]
Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI) However, their superior performance comes at the considerable cost of computational complexity. This paper provides an overview of efficient deep learning methods, systems and applications.
arXiv Detail & Related papers (2022-04-25T16:52:48Z)
FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems [62.20308752994373]
We propose a new smart network interface card (NIC) for distributed AI training systems using field-programmable gate arrays (FPGAs) Our proposed FPGA-based AI smart NIC enhances overall training performance by 1.6x at 6 nodes, with an estimated 2.5x performance improvement at 32 nodes, compared to the baseline system using conventional NICs.
arXiv Detail & Related papers (2022-04-22T21:57:00Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
Pervasive AI for IoT Applications: Resource-efficient Distributed Artificial Intelligence [45.076180487387575]
Artificial intelligence (AI) has witnessed a substantial breakthrough in a variety of Internet of Things (IoT) applications and services. This is driven by the easier access to sensory data and the enormous scale of pervasive/ubiquitous devices that generate zettabytes (ZB) of real-time data streams. The confluence of pervasive computing and artificial intelligence, Pervasive AI, expanded the role of ubiquitous IoT systems.
arXiv Detail & Related papers (2021-05-04T23:42:06Z)
Spiking Neural Networks Hardware Implementations and Challenges: a Survey [53.429871539789445]
Spiking Neural Networks are cognitive algorithms mimicking neuron and synapse operational principles. We present the state of the art of hardware implementations of spiking neural networks. We discuss the strategies employed to leverage the characteristics of these event-driven algorithms at the hardware level.
arXiv Detail & Related papers (2020-05-04T13:24:00Z)
Memristors -- from In-memory computing, Deep Learning Acceleration, Spiking Neural Networks, to the Future of Neuromorphic and Bio-inspired Computing [25.16076541420544]
Machine learning, particularly in the form of deep learning, has driven most of the recent fundamental developments in artificial intelligence. Deep learning has been successfully applied in areas such as object/pattern recognition, speech and natural language processing, self-driving vehicles, intelligent self-diagnostics tools, autonomous robots, knowledgeable personal assistants, and monitoring. This paper reviews the case for a novel beyond CMOS hardware technology, memristors, as a potential solution for the implementation of power-efficient in-memory computing, deep learning accelerators, and spiking neural networks.
arXiv Detail & Related papers (2020-04-30T16:49:03Z)
Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC. To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z)
Resource-Efficient Neural Networks for Embedded Systems [23.532396005466627]
We provide an overview of the current state of the art of machine learning techniques. We focus on resource-efficient inference based on deep neural networks (DNNs), the predominant machine learning models of the past decade. We substantiate our discussion with experiments on well-known benchmark data sets using compression techniques.
arXiv Detail & Related papers (2020-01-07T14:17:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.