Related papers: A Fast Network Exploration Strategy to Profile Low Energy Consumption for Keyword Spotting

A Fast Network Exploration Strategy to Profile Low Energy Consumption for Keyword Spotting

URL: http://arxiv.org/abs/2202.02361v1
Date: Fri, 4 Feb 2022 19:51:41 GMT
Title: A Fast Network Exploration Strategy to Profile Low Energy Consumption for Keyword Spotting
Authors: Arnab Neelim Mazumder, and Tinoosh Mohsenin
Abstract summary: Keywords spotting is an integral part of speech-oriented user interaction targeted for smart devices. We propose a regression-based network exploration technique that considers the scaling of the network filters. Our design is deployed on the Xilinx AC 701 platform and has at least 2.1$times$ and 4$times$ improvements on energy and energy efficiency results.
Score: 1.121535291831358
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Keyword Spotting nowadays is an integral part of speech-oriented user interaction targeted for smart devices. To this extent, neural networks are extensively used for their flexibility and high accuracy. However, coming up with a suitable configuration for both accuracy requirements and hardware deployment is a challenge. We propose a regression-based network exploration technique that considers the scaling of the network filters ($s$) and quantization ($q$) of the network layers, leading to a friendly and energy-efficient configuration for FPGA hardware implementation. We experiment with different combinations of $\mathcal{NN}\scriptstyle\langle q,\,s\rangle \displaystyle$ on the FPGA to profile the energy consumption of the deployed network so that the user can choose the most energy-efficient network configuration promptly. Our accelerator design is deployed on the Xilinx AC 701 platform and has at least 2.1$\times$ and 4$\times$ improvements on energy and energy efficiency results, respectively, compared to recent hardware implementations for keyword spotting.

Related papers

PlatformX: An End-to-End Transferable Platform for Energy-Efficient Neural Architecture Search [10.727973227148114]
Hardware-Aware Neural Architecture (HW-NAS) has emerged as a powerful tool for designing efficient deep neural networks (DNNs) tailored to edge devices.<n>We present PlatformX, a fully automated and transferable HW-NAS framework designed to overcome limitations.
arXiv Detail & Related papers (2025-10-10T04:22:14Z)
Optimizing Neural Networks with Learnable Non-Linear Activation Functions via Lookup-Based FPGA Acceleration [17.92095380908621]
FPGA-based design achieves superior computational speed and over $104$ times higher energy efficiency compared to edge CPUs and GPU.<n>This breakthrough positions our approach as a practical enabler for energy-critical edge AI, where computational intensity and power constraints traditionally preclude the use of adaptive activation networks.
arXiv Detail & Related papers (2025-08-23T15:51:14Z)
Eau De $Q$-Network: Adaptive Distillation of Neural Networks in Deep Reinforcement Learning [17.356042621424567]
We propose a dense-to-sparse algorithm that increases sparsity at the agent's learning pace. We evaluate the proposed approach on the Atari $2600$ benchmark and the MuJoCo physics simulator.
arXiv Detail & Related papers (2025-03-03T11:39:03Z)
SpikeExplorer: hardware-oriented Design Space Exploration for Spiking Neural Networks on FPGA [42.170149806080204]
SpikExplorer is a Python tool for hardware-oriented Automatic Design Space Exploration. It searches the optimal network architecture, neuron model, and internal and training parameters. It reaches 95.8% accuracy on the MNIST dataset, with a power consumption of 180mW/image and a latency of 0.12 ms/image.
arXiv Detail & Related papers (2024-04-04T17:53:08Z)
Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications. The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z)
Energy Efficient Hardware Acceleration of Neural Networks with Power-of-Two Quantisation [0.0]
We show that a hardware neural network accelerator with PoT weights implemented on the Zynq UltraScale + MPSoC ZCU104 FPGA can be at least $1.4x$ more energy efficient than the uniform quantisation version.
arXiv Detail & Related papers (2022-09-30T06:33:40Z)
Hardware Accelerator and Neural Network Co-Optimization for Ultra-Low-Power Audio Processing Devices [0.0]
HANNAH is a framework for automated and combined hardware/software co-design of deep neural networks and hardware accelerators. We show that HANNAH can find suitable neural networks with minimized power consumption and high accuracy for different audio classification tasks.
arXiv Detail & Related papers (2022-09-08T13:29:09Z)
AI in 6G: Energy-Efficient Distributed Machine Learning for Multilayer Heterogeneous Networks [7.318997639507269]
We propose a novel layer-based HetNet architecture which distributes tasks associated with different machine learning approaches across network layers and entities. Such a HetNet boasts multiple access schemes as well as device-to-device (D2D) communications to enhance energy efficiency.
arXiv Detail & Related papers (2022-06-04T22:03:19Z)
Dynamic Slimmable Network [105.74546828182834]
We develop a dynamic network slimming regime named Dynamic Slimmable Network (DS-Net) Our DS-Net is empowered with the ability of dynamic inference by the proposed double-headed dynamic gate. It consistently outperforms its static counterparts as well as state-of-the-art static and dynamic model compression methods.
arXiv Detail & Related papers (2021-03-24T15:25:20Z)
AdderNet and its Minimalist Hardware Design for Energy-Efficient Artificial Intelligence [111.09105910265154]
We present a novel minimalist hardware architecture using adder convolutional neural network (AdderNet) The whole AdderNet can practically achieve 16% enhancement in speed. We conclude the AdderNet is able to surpass all the other competitors.
arXiv Detail & Related papers (2021-01-25T11:31:52Z)
ShiftAddNet: A Hardware-Inspired Deep Network [87.18216601210763]
ShiftAddNet is an energy-efficient multiplication-less deep neural network. It leads to both energy-efficient inference and training, without compromising expressive capacity. ShiftAddNet aggressively reduces over 80% hardware-quantified energy cost of DNNs training and inference, while offering comparable or better accuracies.
arXiv Detail & Related papers (2020-10-24T05:09:14Z)
EmotionNet Nano: An Efficient Deep Convolutional Neural Network Design for Real-time Facial Expression Recognition [75.74756992992147]
This study proposes EmotionNet Nano, an efficient deep convolutional neural network created through a human-machine collaborative design strategy. Two different variants of EmotionNet Nano are presented, each with a different trade-off between architectural and computational complexity and accuracy. We demonstrate that the proposed EmotionNet Nano networks achieved real-time inference speeds (e.g. $>25$ FPS and $>70$ FPS at 15W and 30W, respectively) and high energy efficiency.
arXiv Detail & Related papers (2020-06-29T00:48:05Z)
LogicNets: Co-Designed Neural Networks and Circuits for Extreme-Throughput Applications [6.9276012494882835]
We present a novel method for designing neural network topologies that directly map to a highly efficient FPGA implementation. We show that the combination of sparsity and low-bit activation quantization results in high-speed circuits with small logic depth and low LUT cost.
arXiv Detail & Related papers (2020-04-06T22:15:41Z)
Real-Time Semantic Segmentation via Auto Depth, Downsampling Joint Decision and Feature Aggregation [54.28963233377946]
We propose a joint search framework, called AutoRTNet, to automate the design of segmentation strategies. Specifically, we propose hyper-cells to jointly decide the network depth and downsampling strategy, and an aggregation cell to achieve automatic multi-scale feature aggregation.
arXiv Detail & Related papers (2020-03-31T14:02:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.