Related papers: A Construction Kit for Efficient Low Power Neural Network Accelerator Designs

A Construction Kit for Efficient Low Power Neural Network Accelerator Designs

URL: http://arxiv.org/abs/2106.12810v1
Date: Thu, 24 Jun 2021 07:53:56 GMT
Title: A Construction Kit for Efficient Low Power Neural Network Accelerator Designs
Authors: Petar Jokic, Erfan Azarkhish, Andrea Bonetti, Marc Pons, Stephane Emery, and Luca Benini
Abstract summary: This work provides a survey of neural network accelerator optimization approaches that have been used in recent works. It presents the list of optimizations and their quantitative effects as a construction kit, allowing to assess the design choices for each building block separately.
Score: 11.807678100385164
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Implementing embedded neural network processing at the edge requires efficient hardware acceleration that couples high computational performance with low power consumption. Driven by the rapid evolution of network architectures and their algorithmic features, accelerator designs are constantly updated and improved. To evaluate and compare hardware design choices, designers can refer to a myriad of accelerator implementations in the literature. Surveys provide an overview of these works but are often limited to system-level and benchmark-specific performance metrics, making it difficult to quantitatively compare the individual effect of each utilized optimization technique. This complicates the evaluation of optimizations for new accelerator designs, slowing-down the research progress. This work provides a survey of neural network accelerator optimization approaches that have been used in recent works and reports their individual effects on edge processing performance. It presents the list of optimizations and their quantitative effects as a construction kit, allowing to assess the design choices for each building block separately. Reported optimizations range from up to 10'000x memory savings to 33x energy reductions, providing chip designers an overview of design choices for implementing efficient low power neural network accelerators.

Related papers

Coflex: Enhancing HW-NAS with Sparse Gaussian Processes for Efficient and Scalable DNN Accelerator Design [4.489116569191255]
Hardware-Aware Neural Search (HW-NAS) is an efficient approach to automatically co-optimizing neural network performance and hardware energy efficiency.<n>We propose Coflex, a novel HW-NAS framework that integrates the Sparse Gaussian Process (SGP) with multi-objective Bayesian optimization.
arXiv Detail & Related papers (2025-07-31T11:16:46Z)
MetaML-Pro: Cross-Stage Design Flow Automation for Efficient Deep Learning Acceleration [8.43012094714496]
This paper presents a unified framework for codifying and automating optimization strategies to deploy deep neural networks (DNNs) on resource-constrained hardware. Our novel approach addresses two key issues: cross-stage co-optimization and optimization search. Experimental results demonstrate up to a 92% DSP and 89% LUT usage reduction for select networks.
arXiv Detail & Related papers (2025-02-09T11:02:06Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks [52.97107229149988]
We propose an On-Chip Hardware-Aware Quantization framework, performing hardware-aware mixed-precision quantization on deployed edge devices. For efficiency metrics, we built an On-Chip Quantization Aware pipeline, which allows the quantization process to perceive the actual hardware efficiency of the quantization operator. For accuracy metrics, we propose Mask-Guided Quantization Estimation technology to effectively estimate the accuracy impact of operators in the on-chip scenario.
arXiv Detail & Related papers (2023-09-05T04:39:34Z)
MetaML: Automating Customizable Cross-Stage Design-Flow for Deep Learning Acceleration [5.2487252195308844]
This paper introduces a novel optimization framework for deep neural network (DNN) hardware accelerators. We introduce novel optimization and transformation tasks for building design-flow architectures. Our results demonstrate considerable reductions of up to 92% in DSP usage and 89% in LUT usage for two networks.
arXiv Detail & Related papers (2023-06-14T21:06:07Z)
A Semi-Decoupled Approach to Fast and Optimal Hardware-Software Co-Design of Neural Accelerators [22.69558355718029]
Hardware-software co-design has been emerging to fully reap the benefits of flexible design spaces and optimize neural network performance. Such co-design enlarges the total search space to practically infinity and presents substantial challenges. We propose a emphsemi-decoupled approach to reduce the size of the total design space by orders of magnitude, yet without losing optimality.
arXiv Detail & Related papers (2022-03-25T21:49:42Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
Data-Driven Offline Optimization For Architecting Hardware Accelerators [89.68870139177785]
We develop a data-driven offline optimization method for designing hardware accelerators, dubbed PRIME. PRIME improves performance upon state-of-the-art simulation-driven methods by about 1.54x and 1.20x, while considerably reducing the required total simulation time by 93% and 99%, respectively. In addition, PRIME also architects effective accelerators for unseen applications in a zero-shot setting, outperforming simulation-based methods by 1.26x.
arXiv Detail & Related papers (2021-10-20T17:06:09Z)
Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference [56.24109486973292]
We study the interplay between pruning and quantization during the training of neural networks for ultra low latency applications. We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task.
arXiv Detail & Related papers (2021-02-22T19:00:05Z)
Apollo: Transferable Architecture Exploration [26.489275442359464]
We propose a transferable architecture exploration framework, dubbed Apollo. We show that our framework finds high reward design configurations more sample-efficiently than a baseline black-box optimization approach.
arXiv Detail & Related papers (2021-02-02T19:36:02Z)
Adaptive pruning-based optimization of parameterized quantum circuits [62.997667081978825]
Variisy hybrid quantum-classical algorithms are powerful tools to maximize the use of Noisy Intermediate Scale Quantum devices. We propose a strategy for such ansatze used in variational quantum algorithms, which we call "Efficient Circuit Training" (PECT) Instead of optimizing all of the ansatz parameters at once, PECT launches a sequence of variational algorithms.
arXiv Detail & Related papers (2020-10-01T18:14:11Z)
DANCE: Differentiable Accelerator/Network Co-Exploration [8.540518473228078]
This work presents a differentiable approach towards the co-exploration of the hardware accelerator and network architecture design. By modeling the hardware evaluation software with a neural network, the relation between the accelerator architecture and the hardware metrics becomes differentiable. Compared to the naive existing approaches, our method performs co-exploration in a significantly shorter time, while achieving superior accuracy and hardware cost metrics.
arXiv Detail & Related papers (2020-09-14T07:43:27Z)
Optimisation of a Siamese Neural Network for Real-Time Energy Efficient Object Tracking [0.0]
optimisation of visual object tracking using a Siamese neural network for embedded vision systems is presented. It was assumed that the solution shall operate in real-time, preferably for a high resolution video stream.
arXiv Detail & Related papers (2020-07-01T13:49:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.