A Fast Network Exploration Strategy to Profile Low Energy Consumption
for Keyword Spotting
- URL: http://arxiv.org/abs/2202.02361v1
- Date: Fri, 4 Feb 2022 19:51:41 GMT
- Title: A Fast Network Exploration Strategy to Profile Low Energy Consumption
for Keyword Spotting
- Authors: Arnab Neelim Mazumder, and Tinoosh Mohsenin
- Abstract summary: Keywords spotting is an integral part of speech-oriented user interaction targeted for smart devices.
We propose a regression-based network exploration technique that considers the scaling of the network filters.
Our design is deployed on the Xilinx AC 701 platform and has at least 2.1$times$ and 4$times$ improvements on energy and energy efficiency results.
- Score: 1.121535291831358
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Keyword Spotting nowadays is an integral part of speech-oriented user
interaction targeted for smart devices. To this extent, neural networks are
extensively used for their flexibility and high accuracy. However, coming up
with a suitable configuration for both accuracy requirements and hardware
deployment is a challenge. We propose a regression-based network exploration
technique that considers the scaling of the network filters ($s$) and
quantization ($q$) of the network layers, leading to a friendly and
energy-efficient configuration for FPGA hardware implementation. We experiment
with different combinations of $\mathcal{NN}\scriptstyle\langle q,\,s\rangle
\displaystyle$ on the FPGA to profile the energy consumption of the deployed
network so that the user can choose the most energy-efficient network
configuration promptly. Our accelerator design is deployed on the Xilinx AC 701
platform and has at least 2.1$\times$ and 4$\times$ improvements on energy and
energy efficiency results, respectively, compared to recent hardware
implementations for keyword spotting.
Related papers
- SpikeExplorer: hardware-oriented Design Space Exploration for Spiking Neural Networks on FPGA [42.170149806080204]
SpikExplorer is a Python tool for hardware-oriented Automatic Design Space Exploration.
It searches the optimal network architecture, neuron model, and internal and training parameters.
It reaches 95.8% accuracy on the MNIST dataset, with a power consumption of 180mW/image and a latency of 0.12 ms/image.
arXiv Detail & Related papers (2024-04-04T17:53:08Z) - Reconfigurable Distributed FPGA Cluster Design for Deep Learning
Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications.
The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z) - Energy Efficient Hardware Acceleration of Neural Networks with
Power-of-Two Quantisation [0.0]
We show that a hardware neural network accelerator with PoT weights implemented on the Zynq UltraScale + MPSoC ZCU104 FPGA can be at least $1.4x$ more energy efficient than the uniform quantisation version.
arXiv Detail & Related papers (2022-09-30T06:33:40Z) - Hardware Accelerator and Neural Network Co-Optimization for
Ultra-Low-Power Audio Processing Devices [0.0]
HANNAH is a framework for automated and combined hardware/software co-design of deep neural networks and hardware accelerators.
We show that HANNAH can find suitable neural networks with minimized power consumption and high accuracy for different audio classification tasks.
arXiv Detail & Related papers (2022-09-08T13:29:09Z) - AI in 6G: Energy-Efficient Distributed Machine Learning for Multilayer
Heterogeneous Networks [7.318997639507269]
We propose a novel layer-based HetNet architecture which distributes tasks associated with different machine learning approaches across network layers and entities.
Such a HetNet boasts multiple access schemes as well as device-to-device (D2D) communications to enhance energy efficiency.
arXiv Detail & Related papers (2022-06-04T22:03:19Z) - Dynamic Slimmable Network [105.74546828182834]
We develop a dynamic network slimming regime named Dynamic Slimmable Network (DS-Net)
Our DS-Net is empowered with the ability of dynamic inference by the proposed double-headed dynamic gate.
It consistently outperforms its static counterparts as well as state-of-the-art static and dynamic model compression methods.
arXiv Detail & Related papers (2021-03-24T15:25:20Z) - AdderNet and its Minimalist Hardware Design for Energy-Efficient
Artificial Intelligence [111.09105910265154]
We present a novel minimalist hardware architecture using adder convolutional neural network (AdderNet)
The whole AdderNet can practically achieve 16% enhancement in speed.
We conclude the AdderNet is able to surpass all the other competitors.
arXiv Detail & Related papers (2021-01-25T11:31:52Z) - ShiftAddNet: A Hardware-Inspired Deep Network [87.18216601210763]
ShiftAddNet is an energy-efficient multiplication-less deep neural network.
It leads to both energy-efficient inference and training, without compromising expressive capacity.
ShiftAddNet aggressively reduces over 80% hardware-quantified energy cost of DNNs training and inference, while offering comparable or better accuracies.
arXiv Detail & Related papers (2020-10-24T05:09:14Z) - EmotionNet Nano: An Efficient Deep Convolutional Neural Network Design
for Real-time Facial Expression Recognition [75.74756992992147]
This study proposes EmotionNet Nano, an efficient deep convolutional neural network created through a human-machine collaborative design strategy.
Two different variants of EmotionNet Nano are presented, each with a different trade-off between architectural and computational complexity and accuracy.
We demonstrate that the proposed EmotionNet Nano networks achieved real-time inference speeds (e.g. $>25$ FPS and $>70$ FPS at 15W and 30W, respectively) and high energy efficiency.
arXiv Detail & Related papers (2020-06-29T00:48:05Z) - LogicNets: Co-Designed Neural Networks and Circuits for
Extreme-Throughput Applications [6.9276012494882835]
We present a novel method for designing neural network topologies that directly map to a highly efficient FPGA implementation.
We show that the combination of sparsity and low-bit activation quantization results in high-speed circuits with small logic depth and low LUT cost.
arXiv Detail & Related papers (2020-04-06T22:15:41Z) - Real-Time Semantic Segmentation via Auto Depth, Downsampling Joint
Decision and Feature Aggregation [54.28963233377946]
We propose a joint search framework, called AutoRTNet, to automate the design of segmentation strategies.
Specifically, we propose hyper-cells to jointly decide the network depth and downsampling strategy, and an aggregation cell to achieve automatic multi-scale feature aggregation.
arXiv Detail & Related papers (2020-03-31T14:02:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.