Related papers: DLAS: An Exploration and Assessment of the Deep Learning Acceleration Stack

DLAS: An Exploration and Assessment of the Deep Learning Acceleration Stack

URL: http://arxiv.org/abs/2311.08909v1
Date: Wed, 15 Nov 2023 12:26:31 GMT
Title: DLAS: An Exploration and Assessment of the Deep Learning Acceleration Stack
Authors: Perry Gibson, Jos\'e Cano, Elliot J. Crowley, Amos Storkey, Michael O'Boyle
Abstract summary: We combine machine learning and systems techniques within the Deep Learning Acceleration Stack (DLAS) We evaluate the impact on accuracy and inference time when varying different parameters of DLAS across two datasets. Overall we make 13 key observations, including that speedups provided by compression techniques are very hardware dependent.
Score: 3.7873597471903935
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Deep Neural Networks (DNNs) are extremely computationally demanding, which presents a large barrier to their deployment on resource-constrained devices. Since such devices are where many emerging deep learning applications lie (e.g., drones, vision-based medical technology), significant bodies of work from both the machine learning and systems communities have attempted to provide optimizations to accelerate DNNs. To help unify these two perspectives, in this paper we combine machine learning and systems techniques within the Deep Learning Acceleration Stack (DLAS), and demonstrate how these layers can be tightly dependent on each other with an across-stack perturbation study. We evaluate the impact on accuracy and inference time when varying different parameters of DLAS across two datasets, seven popular DNN architectures, four DNN compression techniques, three algorithmic primitives with sparse and dense variants, untuned and auto-scheduled code generation, and four hardware platforms. Our evaluation highlights how perturbations across DLAS parameters can cause significant variation and across-stack interactions. The highest level observation from our evaluation is that the model size, accuracy, and inference time are not guaranteed to be correlated. Overall we make 13 key observations, including that speedups provided by compression techniques are very hardware dependent, and that compiler auto-tuning can significantly alter what the best algorithm to use for a given configuration is. With DLAS, we aim to provide a reference framework to aid machine learning and systems practitioners in reasoning about the context in which their respective DNN acceleration solutions exist in. With our evaluation strongly motivating the need for co-design, we believe that DLAS can be a valuable concept for exploring the next generation of co-designed accelerated deep learning solutions.

Related papers

FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals [58.83169560132308]
We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of the representations and computations learned by very large neural networks.
arXiv Detail & Related papers (2024-07-18T17:59:01Z)
Inference Optimization of Foundation Models on AI Accelerators [68.24450520773688]
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI. As the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios. This tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators.
arXiv Detail & Related papers (2024-07-12T09:24:34Z)
Random resistive memory-based deep extreme point learning machine for unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM) Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z)
Biologically Plausible Learning on Neuromorphic Hardware Architectures [27.138481022472]
Neuromorphic computing is an emerging paradigm that confronts this imbalance by computations directly in analog memories. This work is the first to compare the impact of different learning algorithms on Compute-In-Memory-based hardware and vice versa.
arXiv Detail & Related papers (2022-12-29T15:10:59Z)
Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications [46.97774949613859]
Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI) However, their superior performance comes at the considerable cost of computational complexity. This paper provides an overview of efficient deep learning methods, systems and applications.
arXiv Detail & Related papers (2022-04-25T16:52:48Z)
Comparison Analysis of Traditional Machine Learning and Deep Learning Techniques for Data and Image Classification [62.997667081978825]
The purpose of the study is to analyse and compare the most common machine learning and deep learning techniques used for computer vision 2D object classification tasks. Firstly, we will present the theoretical background of the Bag of Visual words model and Deep Convolutional Neural Networks (DCNN) Secondly, we will implement a Bag of Visual Words model, the VGG16 CNN Architecture.
arXiv Detail & Related papers (2022-04-11T11:34:43Z)
Neurosymbolic hybrid approach to driver collision warning [64.02492460600905]
There are two main algorithmic approaches to autonomous driving systems. Deep learning alone has achieved state-of-the-art results in many areas. But sometimes it can be very difficult to debug if the deep learning model doesn't work.
arXiv Detail & Related papers (2022-03-28T20:29:50Z)
Accelerating deep neural networks for efficient scene understanding in automotive cyber-physical systems [2.4373900721120285]
Automotive Cyber-Physical Systems (ACPS) have attracted a significant amount of interest in the past few decades. One of the most critical operations in these systems is the perception of the environment. Deep learning and, especially, the use of Deep Neural Networks (DNNs) provides impressive results in analyzing and understanding complex and dynamic scenes from visual data.
arXiv Detail & Related papers (2021-07-19T18:43:17Z)
How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures [7.085772863979686]
Deep neural networks (DNNs) have led to large strides in various Artificial Intelligence (AI) inference tasks, such as object and speech recognition. deploying such AI models across commodity devices faces significant challenges. We present techniques for achieving real-time performance following a cross-stack approach.
arXiv Detail & Related papers (2021-06-21T11:23:12Z)
CLAN: Continuous Learning using Asynchronous Neuroevolution on Commodity Edge Devices [3.812706195714961]
We build a prototype distributed system of Raspberry Pis communicating via WiFi running NeuroEvolutionary (NE) learning and inference. We evaluate the performance of such a collaborative system and detail the compute/communication characteristics of different arrangements of the system.
arXiv Detail & Related papers (2020-08-27T01:49:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.