Related papers: Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

URL: http://arxiv.org/abs/2312.15322v1
Date: Sat, 23 Dec 2023 18:50:13 GMT
Title: Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization
Authors: Konstantinos Balaskas, Andreas Karatzas, Christos Sad, Kostas Siozios, Iraklis Anagnostopoulos, Georgios Zervakis, J\"org Henkel
Abstract summary: We propose an automated framework to compress Deep Neural Networks (DNNs) in a hardware-aware manner by jointly employing pruning and quantization. Our framework achieves $39%$ average energy reduction for datasets $1.7%$ average accuracy loss and outperforms significantly the state-of-the-art approaches.
Score: 1.0235078178220354
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep Neural Networks (DNNs) have shown significant advantages in a wide variety of domains. However, DNNs are becoming computationally intensive and energy hungry at an exponential pace, while at the same time, there is a vast demand for running sophisticated DNN-based services on resource constrained embedded devices. In this paper, we target energy-efficient inference on embedded DNN accelerators. To that end, we propose an automated framework to compress DNNs in a hardware-aware manner by jointly employing pruning and quantization. We explore, for the first time, per-layer fine- and coarse-grained pruning, in the same DNN architecture, in addition to low bit-width mixed-precision quantization for weights and activations. Reinforcement Learning (RL) is used to explore the associated design space and identify the pruning-quantization configuration so that the energy consumption is minimized whilst the prediction accuracy loss is retained at acceptable levels. Using our novel composite RL agent we are able to extract energy-efficient solutions without requiring retraining and/or fine tuning. Our extensive experimental evaluation over widely used DNNs and the CIFAR-10/100 and ImageNet datasets demonstrates that our framework achieves $39\%$ average energy reduction for $1.7\%$ average accuracy loss and outperforms significantly the state-of-the-art approaches.

Related papers

Differential Coding for Training-Free ANN-to-SNN Conversion [45.70141988713627]
Spiking Neural Networks (SNNs) exhibit significant potential due to their low energy consumption. converting Artificial Neural Networks (ANNs) to SNNs is an efficient way to achieve high-performance SNNs. This article introduces differential coding for ANN-to-SNN conversion, a novel coding scheme that reduces spike counts and energy consumption.
arXiv Detail & Related papers (2025-03-01T02:17:35Z)
QP-SNN: Quantized and Pruned Spiking Neural Networks [10.74122828236122]
Spiking Neural Networks (SNNs) leverage spikes to encode information and operate in an event-driven manner. We propose a hardware-friendly and lightweight SNN, aimed at effectively deploying high-performance SNN in resource-limited scenarios.
arXiv Detail & Related papers (2025-02-09T13:50:59Z)
DNN Partitioning, Task Offloading, and Resource Allocation in Dynamic Vehicular Networks: A Lyapunov-Guided Diffusion-Based Reinforcement Learning Approach [49.56404236394601]
We formulate the problem of joint DNN partitioning, task offloading, and resource allocation in Vehicular Edge Computing. Our objective is to minimize the DNN-based task completion time while guaranteeing the system stability over time. We propose a Multi-Agent Diffusion-based Deep Reinforcement Learning (MAD2RL) algorithm, incorporating the innovative use of diffusion models.
arXiv Detail & Related papers (2024-06-11T06:31:03Z)
LitE-SNN: Designing Lightweight and Efficient Spiking Neural Network through Spatial-Temporal Compressive Network Search and Joint Optimization [48.41286573672824]
Spiking Neural Networks (SNNs) mimic the information-processing mechanisms of the human brain and are highly energy-efficient. We propose a new approach named LitE-SNN that incorporates both spatial and temporal compression into the automated network design process.
arXiv Detail & Related papers (2024-01-26T05:23:11Z)
DeepAxe: A Framework for Exploration of Approximation and Reliability Trade-offs in DNN Accelerators [0.9556128246747769]
The role of Deep Neural Networks (DNNs) in safety-critical applications is expanding. DNNs experience massive growth in terms of computation power. It raises the necessity of improving the reliability of DNN accelerators.
arXiv Detail & Related papers (2023-03-14T20:42:38Z)
The Hardware Impact of Quantization and Pruning for Weights in Spiking Neural Networks [0.368986335765876]
quantization and pruning of parameters can both compress the model size, reduce memory footprints, and facilitate low-latency execution. We study various combinations of pruning and quantization in isolation, cumulatively, and simultaneously to a state-of-the-art SNN targeting gesture recognition. We show that this state-of-the-art model is amenable to aggressive parameter quantization, not suffering from any loss in accuracy down to ternary weights.
arXiv Detail & Related papers (2023-02-08T16:25:20Z)
Energy-efficient DNN Inference on Approximate Accelerators Through Formal Property Exploration [1.0323063834827415]
We present an automated framework for weight-to-approximation mapping for approximate Deep Neural Networks (DNNs) At the MAC unit level, our evaluation surpassed already energy-efficient mappings by more than $times2$ in terms of energy gains.
arXiv Detail & Related papers (2022-07-25T17:07:00Z)
Sparsifying Binary Networks [3.8350038566047426]
Binary neural networks (BNNs) have demonstrated their ability to solve complex tasks with comparable accuracy as full-precision deep neural networks (DNNs) Despite the recent improvements, they suffer from a fixed and limited compression factor that may result insufficient for certain devices with very limited resources. We propose sparse binary neural networks (SBNNs), a novel model and training scheme which introduces sparsity in BNNs and a new quantization function for binarizing the network's weights.
arXiv Detail & Related papers (2022-07-11T15:54:41Z)
Single-Shot Optical Neural Network [55.41644538483948]
'Weight-stationary' analog optical and electronic hardware has been proposed to reduce the compute resources required by deep neural networks. We present a scalable, single-shot-per-layer weight-stationary optical processor.
arXiv Detail & Related papers (2022-05-18T17:49:49Z)
Hybrid SNN-ANN: Energy-Efficient Classification and Object Detection for Event-Based Vision [64.71260357476602]
Event-based vision sensors encode local pixel-wise brightness changes in streams of events rather than image frames. Recent progress in object recognition from event-based sensors has come from conversions of deep neural networks. We propose a hybrid architecture for end-to-end training of deep neural networks for event-based pattern recognition and object detection.
arXiv Detail & Related papers (2021-12-06T23:45:58Z)
ShiftAddNet: A Hardware-Inspired Deep Network [87.18216601210763]
ShiftAddNet is an energy-efficient multiplication-less deep neural network. It leads to both energy-efficient inference and training, without compromising expressive capacity. ShiftAddNet aggressively reduces over 80% hardware-quantified energy cost of DNNs training and inference, while offering comparable or better accuracies.
arXiv Detail & Related papers (2020-10-24T05:09:14Z)
TaxoNN: A Light-Weight Accelerator for Deep Neural Network Training [2.5025363034899732]
We present a novel approach to add the training ability to a baseline DNN accelerator (inference only) by splitting the SGD algorithm into simple computational elements. Based on this approach we propose TaxoNN, a light-weight accelerator for DNN training. Our experimental results show that TaxoNN delivers, on average, 0.97% higher misclassification rate compared to a full-precision implementation.
arXiv Detail & Related papers (2020-10-11T09:04:19Z)
PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space. With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.