DLAS: An Exploration and Assessment of the Deep Learning Acceleration
Stack
- URL: http://arxiv.org/abs/2311.08909v1
- Date: Wed, 15 Nov 2023 12:26:31 GMT
- Title: DLAS: An Exploration and Assessment of the Deep Learning Acceleration
Stack
- Authors: Perry Gibson, Jos\'e Cano, Elliot J. Crowley, Amos Storkey, Michael
O'Boyle
- Abstract summary: We combine machine learning and systems techniques within the Deep Learning Acceleration Stack (DLAS)
We evaluate the impact on accuracy and inference time when varying different parameters of DLAS across two datasets.
Overall we make 13 key observations, including that speedups provided by compression techniques are very hardware dependent.
- Score: 3.7873597471903935
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep Neural Networks (DNNs) are extremely computationally demanding, which
presents a large barrier to their deployment on resource-constrained devices.
Since such devices are where many emerging deep learning applications lie
(e.g., drones, vision-based medical technology), significant bodies of work
from both the machine learning and systems communities have attempted to
provide optimizations to accelerate DNNs. To help unify these two perspectives,
in this paper we combine machine learning and systems techniques within the
Deep Learning Acceleration Stack (DLAS), and demonstrate how these layers can
be tightly dependent on each other with an across-stack perturbation study. We
evaluate the impact on accuracy and inference time when varying different
parameters of DLAS across two datasets, seven popular DNN architectures, four
DNN compression techniques, three algorithmic primitives with sparse and dense
variants, untuned and auto-scheduled code generation, and four hardware
platforms. Our evaluation highlights how perturbations across DLAS parameters
can cause significant variation and across-stack interactions. The highest
level observation from our evaluation is that the model size, accuracy, and
inference time are not guaranteed to be correlated. Overall we make 13 key
observations, including that speedups provided by compression techniques are
very hardware dependent, and that compiler auto-tuning can significantly alter
what the best algorithm to use for a given configuration is. With DLAS, we aim
to provide a reference framework to aid machine learning and systems
practitioners in reasoning about the context in which their respective DNN
acceleration solutions exist in. With our evaluation strongly motivating the
need for co-design, we believe that DLAS can be a valuable concept for
exploring the next generation of co-designed accelerated deep learning
solutions.
Related papers
- FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - Inference Optimization of Foundation Models on AI Accelerators [68.24450520773688]
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI.
As the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios.
This tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators.
arXiv Detail & Related papers (2024-07-12T09:24:34Z) - Random resistive memory-based deep extreme point learning machine for
unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM)
Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z) - Biologically Plausible Learning on Neuromorphic Hardware Architectures [27.138481022472]
Neuromorphic computing is an emerging paradigm that confronts this imbalance by computations directly in analog memories.
This work is the first to compare the impact of different learning algorithms on Compute-In-Memory-based hardware and vice versa.
arXiv Detail & Related papers (2022-12-29T15:10:59Z) - Enable Deep Learning on Mobile Devices: Methods, Systems, and
Applications [46.97774949613859]
Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI)
However, their superior performance comes at the considerable cost of computational complexity.
This paper provides an overview of efficient deep learning methods, systems and applications.
arXiv Detail & Related papers (2022-04-25T16:52:48Z) - Comparison Analysis of Traditional Machine Learning and Deep Learning
Techniques for Data and Image Classification [62.997667081978825]
The purpose of the study is to analyse and compare the most common machine learning and deep learning techniques used for computer vision 2D object classification tasks.
Firstly, we will present the theoretical background of the Bag of Visual words model and Deep Convolutional Neural Networks (DCNN)
Secondly, we will implement a Bag of Visual Words model, the VGG16 CNN Architecture.
arXiv Detail & Related papers (2022-04-11T11:34:43Z) - Neurosymbolic hybrid approach to driver collision warning [64.02492460600905]
There are two main algorithmic approaches to autonomous driving systems.
Deep learning alone has achieved state-of-the-art results in many areas.
But sometimes it can be very difficult to debug if the deep learning model doesn't work.
arXiv Detail & Related papers (2022-03-28T20:29:50Z) - Accelerating deep neural networks for efficient scene understanding in
automotive cyber-physical systems [2.4373900721120285]
Automotive Cyber-Physical Systems (ACPS) have attracted a significant amount of interest in the past few decades.
One of the most critical operations in these systems is the perception of the environment.
Deep learning and, especially, the use of Deep Neural Networks (DNNs) provides impressive results in analyzing and understanding complex and dynamic scenes from visual data.
arXiv Detail & Related papers (2021-07-19T18:43:17Z) - How to Reach Real-Time AI on Consumer Devices? Solutions for
Programmable and Custom Architectures [7.085772863979686]
Deep neural networks (DNNs) have led to large strides in various Artificial Intelligence (AI) inference tasks, such as object and speech recognition.
deploying such AI models across commodity devices faces significant challenges.
We present techniques for achieving real-time performance following a cross-stack approach.
arXiv Detail & Related papers (2021-06-21T11:23:12Z) - CLAN: Continuous Learning using Asynchronous Neuroevolution on Commodity
Edge Devices [3.812706195714961]
We build a prototype distributed system of Raspberry Pis communicating via WiFi running NeuroEvolutionary (NE) learning and inference.
We evaluate the performance of such a collaborative system and detail the compute/communication characteristics of different arrangements of the system.
arXiv Detail & Related papers (2020-08-27T01:49:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.