Hardware and Software Optimizations for Accelerating Deep Neural
Networks: Survey of Current Trends, Challenges, and the Road Ahead
- URL: http://arxiv.org/abs/2012.11233v1
- Date: Mon, 21 Dec 2020 10:27:48 GMT
- Title: Hardware and Software Optimizations for Accelerating Deep Neural
Networks: Survey of Current Trends, Challenges, and the Road Ahead
- Authors: Maurizio Capra, Beatrice Bussolino, Alberto Marchisio, Guido Masera,
Maurizio Martina, Muhammad Shafique
- Abstract summary: This paper introduces the key properties of two brain-inspired models like Deep Neural Network (DNN), and then analyzes techniques to produce efficient and high-performance designs.
A single inference of a DL model may require billions of multiply-and-accumulated operations, making the DL extremely compute- and energy-hungry.
- Score: 14.313423044185583
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Currently, Machine Learning (ML) is becoming ubiquitous in everyday life.
Deep Learning (DL) is already present in many applications ranging from
computer vision for medicine to autonomous driving of modern cars as well as
other sectors in security, healthcare, and finance. However, to achieve
impressive performance, these algorithms employ very deep networks, requiring a
significant computational power, both during the training and inference time. A
single inference of a DL model may require billions of multiply-and-accumulated
operations, making the DL extremely compute- and energy-hungry. In a scenario
where several sophisticated algorithms need to be executed with limited energy
and low latency, the need for cost-effective hardware platforms capable of
implementing energy-efficient DL execution arises. This paper first introduces
the key properties of two brain-inspired models like Deep Neural Network (DNN),
and Spiking Neural Network (SNN), and then analyzes techniques to produce
efficient and high-performance designs. This work summarizes and compares the
works for four leading platforms for the execution of algorithms such as CPU,
GPU, FPGA and ASIC describing the main solutions of the state-of-the-art,
giving much prominence to the last two solutions since they offer greater
design flexibility and bear the potential of high energy-efficiency, especially
for the inference process. In addition to hardware solutions, this paper
discusses some of the important security issues that these DNN and SNN models
may have during their execution, and offers a comprehensive section on
benchmarking, explaining how to assess the quality of different networks and
hardware systems designed for them.
Related papers
- DNN Partitioning, Task Offloading, and Resource Allocation in Dynamic Vehicular Networks: A Lyapunov-Guided Diffusion-Based Reinforcement Learning Approach [49.56404236394601]
We formulate the problem of joint DNN partitioning, task offloading, and resource allocation in Vehicular Edge Computing.
Our objective is to minimize the DNN-based task completion time while guaranteeing the system stability over time.
We propose a Multi-Agent Diffusion-based Deep Reinforcement Learning (MAD2RL) algorithm, incorporating the innovative use of diffusion models.
arXiv Detail & Related papers (2024-06-11T06:31:03Z) - Quantization-aware Neural Architectural Search for Intrusion Detection [5.010685611319813]
We present a design methodology that automatically trains and evolves quantized neural network (NN) models that are a thousand times smaller than state-of-the-art NNs.
The number of LUTs utilized by this network when deployed to an FPGA is between 2.3x and 8.5x smaller with performance comparable to prior work.
arXiv Detail & Related papers (2023-11-07T18:35:29Z) - Computation-efficient Deep Learning for Computer Vision: A Survey [121.84121397440337]
Deep learning models have reached or even exceeded human-level performance in a range of visual perception tasks.
Deep learning models usually demand significant computational resources, leading to impractical power consumption, latency, or carbon emissions in real-world scenarios.
New research focus is computationally efficient deep learning, which strives to achieve satisfactory performance while minimizing the computational cost during inference.
arXiv Detail & Related papers (2023-08-27T03:55:28Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - From DNNs to GANs: Review of efficient hardware architectures for deep
learning [0.0]
Neural network and deep learning has been started to impact the present research paradigm.
DSP processors are incapable of performing neural network, activation function, convolutional neural network and generative adversarial network operations.
Different algorithms have been adapted to design a DSP processor compatible for fast performance in neural network, activation function, convolutional neural network and generative adversarial network.
arXiv Detail & Related papers (2021-06-06T13:23:06Z) - Learning on Hardware: A Tutorial on Neural Network Accelerators and
Co-Processors [0.0]
Deep neural networks (DNNs) have the advantage that they can take into account a large number of parameters, which enables them to solve complex tasks.
In computer vision and speech recognition, they have a better accuracy than common algorithms, and in some tasks, they boast an even higher accuracy than human experts.
With the progress of DNNs in recent years, many other fields of application such as diagnosis of diseases and autonomous driving are taking advantage of them.
arXiv Detail & Related papers (2021-04-19T12:50:27Z) - Real-time Multi-Task Diffractive Deep Neural Networks via
Hardware-Software Co-design [1.6066483376871004]
This work proposes a novel hardware-software co-design method that enables robust and noise-resilient Multi-task Learning in D$2$NNs.
Our experimental results demonstrate significant improvements in versatility and hardware efficiency, and also demonstrate the robustness of proposed multi-task D$2$NN architecture.
arXiv Detail & Related papers (2020-12-16T12:29:54Z) - ShiftAddNet: A Hardware-Inspired Deep Network [87.18216601210763]
ShiftAddNet is an energy-efficient multiplication-less deep neural network.
It leads to both energy-efficient inference and training, without compromising expressive capacity.
ShiftAddNet aggressively reduces over 80% hardware-quantified energy cost of DNNs training and inference, while offering comparable or better accuracies.
arXiv Detail & Related papers (2020-10-24T05:09:14Z) - HAPI: Hardware-Aware Progressive Inference [18.214367595727037]
Convolutional neural networks (CNNs) have recently become the state-of-the-art in a diversity of AI tasks.
Despite their popularity, CNN inference still comes at a high computational cost.
This work presents HAPI, a novel methodology for generating high-performance early-exit networks.
arXiv Detail & Related papers (2020-08-10T09:55:18Z) - Spiking Neural Networks Hardware Implementations and Challenges: a
Survey [53.429871539789445]
Spiking Neural Networks are cognitive algorithms mimicking neuron and synapse operational principles.
We present the state of the art of hardware implementations of spiking neural networks.
We discuss the strategies employed to leverage the characteristics of these event-driven algorithms at the hardware level.
arXiv Detail & Related papers (2020-05-04T13:24:00Z) - Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G
Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC.
To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.