Related papers: Improving Efficiency in Neural Network Accelerator Using Operands Hamming Distance optimization

Improving Efficiency in Neural Network Accelerator Using Operands Hamming Distance optimization

URL: http://arxiv.org/abs/2002.05293v1
Date: Thu, 13 Feb 2020 00:36:36 GMT
Title: Improving Efficiency in Neural Network Accelerator Using Operands Hamming Distance optimization
Authors: Meng Li and Yilei Li and Pierce Chuang and Liangzhen Lai and Vikas Chandra
Abstract summary: We show that the data-path energy is highly correlated with the bit flips when streaming the input operands into the arithmetic units. We propose a post-training optimization algorithm and a hamming-distance-aware training algorithm to co-optimize the accelerator and the network.
Score: 11.309076080980828
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural network accelerator is a key enabler for the on-device AI inference, for which energy efficiency is an important metric. The data-path energy, including the computation energy and the data movement energy among the arithmetic units, claims a significant part of the total accelerator energy. By revisiting the basic physics of the arithmetic logic circuits, we show that the data-path energy is highly correlated with the bit flips when streaming the input operands into the arithmetic units, defined as the hamming distance of the input operand matrices. Based on the insight, we propose a post-training optimization algorithm and a hamming-distance-aware training algorithm to co-design and co-optimize the accelerator and the network synergistically. The experimental results based on post-layout simulation with MobileNetV2 demonstrate on average 2.85X data-path energy reduction and up to 8.51X data-path energy reduction for certain layers.

Related papers

Energy-Efficient Supervised Learning with a Binary Stochastic Forward-Forward Algorithm [0.0]
We derive forward-forward algorithms for binary, units.<n>We evaluate our proposed algorithms on the MNIST, Fashion-MNIST, and CIFAR-10 datasets.
arXiv Detail & Related papers (2025-07-09T00:29:06Z)
Over-the-Air Multi-Sensor Inference with Neural Networks Using Memristor-Based Analog Computing [13.5346836945515]
This study proposes a multi-sensor wireless inference system with memristor-based analog computing. Given the sensors' limited computational capabilities, the features from the network's front end are transmitted to a central device. We also introduce a trainable over-the-air sensor fusion method based on $L_p$-norm inspired combining function.
arXiv Detail & Related papers (2025-01-17T15:14:58Z)
A Fully Hardware Implemented Accelerator Design in ReRAM Analog Computing without ADCs [5.6496088684920345]
ReRAM-based accelerators process neural networks via analog Computing-in-Memory (CiM) for ultra-high energy efficiency. This work explores the hardware implementation of the Sigmoid and SoftMax activation functions of neural networks with crossbarally binarized neurons. We propose a complete ReRAM-based Analog Computing Accelerator (RACA) that accelerates neural network computation by leveraging inferenceally binarized neurons.
arXiv Detail & Related papers (2024-12-27T09:38:19Z)
Neuromorphic Wireless Split Computing with Multi-Level Spikes [69.73249913506042]
Neuromorphic computing uses spiking neural networks (SNNs) to perform inference tasks. embedding a small payload within each spike exchanged between spiking neurons can enhance inference accuracy without increasing energy consumption. split computing - where an SNN is partitioned across two devices - is a promising solution. This paper presents the first comprehensive study of a neuromorphic wireless split computing architecture that employs multi-level SNNs.
arXiv Detail & Related papers (2024-11-07T14:08:35Z)
DCP: Learning Accelerator Dataflow for Neural Network via Propagation [52.06154296196845]
This work proposes an efficient data-centric approach, named Dataflow Code Propagation (DCP), to automatically find the optimal dataflow for DNN layers in seconds without human effort. DCP learns a neural predictor to efficiently update the dataflow codes towards the desired gradient directions to minimize various optimization objectives. For example, without using additional training data, DCP surpasses the GAMMA method that performs a full search using thousands of samples.
arXiv Detail & Related papers (2024-10-09T05:16:44Z)
Federated Learning With Energy Harvesting Devices: An MDP Framework [5.852486435612777]
Federated learning (FL) requires edge devices to perform local training and exchange information with a parameter server. A critical challenge in practical FL systems is the rapid energy depletion of battery-limited edge devices. We apply energy harvesting technique in FL systems to extract ambient energy for continuously powering edge devices.
arXiv Detail & Related papers (2024-05-17T03:41:40Z)
Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators [0.20971479389679332]
Energy efficiency and memory footprint of a convolutional neural network (CNN) implemented on a CNN inference accelerator depend on many factors. We show that enabling rich mixed quantization schemes during the implementation can open a previously hidden space of mappings. CNNs utilizing quantized weights and activations and suitable mappings can significantly improve trade-offs among the accuracy, energy, and memory requirements.
arXiv Detail & Related papers (2024-04-08T10:10:30Z)
Measuring the Energy Consumption and Efficiency of Deep Neural Networks: An Empirical Analysis and Design Recommendations [0.49478969093606673]
BUTTER-E dataset is an augmentation to the BUTTER Empirical Deep Learning dataset. This dataset reveals the complex relationship between dataset size, network structure, and energy use. We propose a straightforward and effective energy model that accounts for network size, computing, and memory hierarchy.
arXiv Detail & Related papers (2024-03-13T00:27:19Z)
EPIM: Efficient Processing-In-Memory Accelerators based on Epitome [78.79382890789607]
We introduce the Epitome, a lightweight neural operator offering convolution-like functionality. On the software side, we evaluate epitomes' latency and energy on PIM accelerators. We introduce a PIM-aware layer-wise design method to enhance their hardware efficiency.
arXiv Detail & Related papers (2023-11-12T17:56:39Z)
Gradual Optimization Learning for Conformational Energy Minimization [69.36925478047682]
Gradual Optimization Learning Framework (GOLF) for energy minimization with neural networks significantly reduces the required additional data. Our results demonstrate that the neural network trained with GOLF performs on par with the oracle on a benchmark of diverse drug-like molecules.
arXiv Detail & Related papers (2023-11-05T11:48:08Z)
Multiagent Reinforcement Learning with an Attention Mechanism for Improving Energy Efficiency in LoRa Networks [52.96907334080273]
As the network scale increases, the energy efficiency of LoRa networks decreases sharply due to severe packet collisions. We propose a transmission parameter allocation algorithm based on multiagent reinforcement learning (MALoRa) Simulation results demonstrate that MALoRa significantly improves the system EE compared with baseline algorithms.
arXiv Detail & Related papers (2023-09-16T11:37:23Z)
Precise Energy Consumption Measurements of Heterogeneous Artificial Intelligence Workloads [0.534434568021034]
We present measurements of the energy consumption of two typical applications of deep learning models on different types of compute nodes. One advantage of our approach is that the information on energy consumption is available to all users of the supercomputer.
arXiv Detail & Related papers (2022-12-03T21:40:55Z)
Energy Efficient Hardware Acceleration of Neural Networks with Power-of-Two Quantisation [0.0]
We show that a hardware neural network accelerator with PoT weights implemented on the Zynq UltraScale + MPSoC ZCU104 FPGA can be at least $1.4x$ more energy efficient than the uniform quantisation version.
arXiv Detail & Related papers (2022-09-30T06:33:40Z)
Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference [56.24109486973292]
We study the interplay between pruning and quantization during the training of neural networks for ultra low latency applications. We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task.
arXiv Detail & Related papers (2021-02-22T19:00:05Z)
To Talk or to Work: Flexible Communication Compression for Energy Efficient Federated Learning over Heterogeneous Mobile Edge Devices [78.38046945665538]
federated learning (FL) over massive mobile edge devices opens new horizons for numerous intelligent mobile applications. FL imposes huge communication and computation burdens on participating devices due to periodical global synchronization and continuous local training. We develop a convergence-guaranteed FL algorithm enabling flexible communication compression.
arXiv Detail & Related papers (2020-12-22T02:54:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.