Improving Efficiency in Neural Network Accelerator Using Operands
Hamming Distance optimization
- URL: http://arxiv.org/abs/2002.05293v1
- Date: Thu, 13 Feb 2020 00:36:36 GMT
- Title: Improving Efficiency in Neural Network Accelerator Using Operands
Hamming Distance optimization
- Authors: Meng Li and Yilei Li and Pierce Chuang and Liangzhen Lai and Vikas
Chandra
- Abstract summary: We show that the data-path energy is highly correlated with the bit flips when streaming the input operands into the arithmetic units.
We propose a post-training optimization algorithm and a hamming-distance-aware training algorithm to co-optimize the accelerator and the network.
- Score: 11.309076080980828
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural network accelerator is a key enabler for the on-device AI inference,
for which energy efficiency is an important metric. The data-path energy,
including the computation energy and the data movement energy among the
arithmetic units, claims a significant part of the total accelerator energy. By
revisiting the basic physics of the arithmetic logic circuits, we show that the
data-path energy is highly correlated with the bit flips when streaming the
input operands into the arithmetic units, defined as the hamming distance of
the input operand matrices. Based on the insight, we propose a post-training
optimization algorithm and a hamming-distance-aware training algorithm to
co-design and co-optimize the accelerator and the network synergistically. The
experimental results based on post-layout simulation with MobileNetV2
demonstrate on average 2.85X data-path energy reduction and up to 8.51X
data-path energy reduction for certain layers.
Related papers
- DCP: Learning Accelerator Dataflow for Neural Network via Propagation [52.06154296196845]
This work proposes an efficient data-centric approach, named Dataflow Code Propagation (DCP), to automatically find the optimal dataflow for DNN layers in seconds without human effort.
DCP learns a neural predictor to efficiently update the dataflow codes towards the desired gradient directions to minimize various optimization objectives.
For example, without using additional training data, DCP surpasses the GAMMA method that performs a full search using thousands of samples.
arXiv Detail & Related papers (2024-10-09T05:16:44Z) - Federated Learning With Energy Harvesting Devices: An MDP Framework [5.852486435612777]
Federated learning (FL) requires edge devices to perform local training and exchange information with a parameter server.
A critical challenge in practical FL systems is the rapid energy depletion of battery-limited edge devices.
We apply energy harvesting technique in FL systems to extract ambient energy for continuously powering edge devices.
arXiv Detail & Related papers (2024-05-17T03:41:40Z) - Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators [0.20971479389679332]
Energy efficiency and memory footprint of a convolutional neural network (CNN) implemented on a CNN inference accelerator depend on many factors.
We show that enabling rich mixed quantization schemes during the implementation can open a previously hidden space of mappings.
CNNs utilizing quantized weights and activations and suitable mappings can significantly improve trade-offs among the accuracy, energy, and memory requirements.
arXiv Detail & Related papers (2024-04-08T10:10:30Z) - Measuring the Energy Consumption and Efficiency of Deep Neural Networks:
An Empirical Analysis and Design Recommendations [0.49478969093606673]
BUTTER-E dataset is an augmentation to the BUTTER Empirical Deep Learning dataset.
This dataset reveals the complex relationship between dataset size, network structure, and energy use.
We propose a straightforward and effective energy model that accounts for network size, computing, and memory hierarchy.
arXiv Detail & Related papers (2024-03-13T00:27:19Z) - EPIM: Efficient Processing-In-Memory Accelerators based on Epitome [78.79382890789607]
We introduce the Epitome, a lightweight neural operator offering convolution-like functionality.
On the software side, we evaluate epitomes' latency and energy on PIM accelerators.
We introduce a PIM-aware layer-wise design method to enhance their hardware efficiency.
arXiv Detail & Related papers (2023-11-12T17:56:39Z) - Gradual Optimization Learning for Conformational Energy Minimization [69.36925478047682]
Gradual Optimization Learning Framework (GOLF) for energy minimization with neural networks significantly reduces the required additional data.
Our results demonstrate that the neural network trained with GOLF performs on par with the oracle on a benchmark of diverse drug-like molecules.
arXiv Detail & Related papers (2023-11-05T11:48:08Z) - Multiagent Reinforcement Learning with an Attention Mechanism for
Improving Energy Efficiency in LoRa Networks [52.96907334080273]
As the network scale increases, the energy efficiency of LoRa networks decreases sharply due to severe packet collisions.
We propose a transmission parameter allocation algorithm based on multiagent reinforcement learning (MALoRa)
Simulation results demonstrate that MALoRa significantly improves the system EE compared with baseline algorithms.
arXiv Detail & Related papers (2023-09-16T11:37:23Z) - Precise Energy Consumption Measurements of Heterogeneous Artificial
Intelligence Workloads [0.534434568021034]
We present measurements of the energy consumption of two typical applications of deep learning models on different types of compute nodes.
One advantage of our approach is that the information on energy consumption is available to all users of the supercomputer.
arXiv Detail & Related papers (2022-12-03T21:40:55Z) - Energy Efficient Hardware Acceleration of Neural Networks with
Power-of-Two Quantisation [0.0]
We show that a hardware neural network accelerator with PoT weights implemented on the Zynq UltraScale + MPSoC ZCU104 FPGA can be at least $1.4x$ more energy efficient than the uniform quantisation version.
arXiv Detail & Related papers (2022-09-30T06:33:40Z) - Ps and Qs: Quantization-aware pruning for efficient low latency neural
network inference [56.24109486973292]
We study the interplay between pruning and quantization during the training of neural networks for ultra low latency applications.
We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task.
arXiv Detail & Related papers (2021-02-22T19:00:05Z) - To Talk or to Work: Flexible Communication Compression for Energy
Efficient Federated Learning over Heterogeneous Mobile Edge Devices [78.38046945665538]
federated learning (FL) over massive mobile edge devices opens new horizons for numerous intelligent mobile applications.
FL imposes huge communication and computation burdens on participating devices due to periodical global synchronization and continuous local training.
We develop a convergence-guaranteed FL algorithm enabling flexible communication compression.
arXiv Detail & Related papers (2020-12-22T02:54:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.