Hardware Aware Training for Efficient Keyword Spotting on General
Purpose and Specialized Hardware
- URL: http://arxiv.org/abs/2009.04465v3
- Date: Wed, 10 Mar 2021 03:10:09 GMT
- Title: Hardware Aware Training for Efficient Keyword Spotting on General
Purpose and Specialized Hardware
- Authors: Peter Blouw, Gurshaant Malik, Benjamin Morcos, Aaron R. Voelker, and
Chris Eliasmith
- Abstract summary: Keywords spotting (KWS) provides a critical user interface for many mobile and edge applications, including phones, wearables, and cars.
We use hardware aware training (HAT) to build new KWS neural networks based on the Legendre Memory Unit (LMU) that achieve state-of-the-art (SotA) accuracy and low parameter counts.
We also characterize the power requirements of custom designed accelerator hardware that achieves SotA power efficiency of 8.79$mu$W, beating general purpose low power hardware (a microcontroller) by 24x and special purpose ASICs by 16x.
- Score: 6.557082555839738
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Keyword spotting (KWS) provides a critical user interface for many mobile and
edge applications, including phones, wearables, and cars. As KWS systems are
typically 'always on', maximizing both accuracy and power efficiency are
central to their utility. In this work we use hardware aware training (HAT) to
build new KWS neural networks based on the Legendre Memory Unit (LMU) that
achieve state-of-the-art (SotA) accuracy and low parameter counts. This allows
the neural network to run efficiently on standard hardware (212$\mu$W). We also
characterize the power requirements of custom designed accelerator hardware
that achieves SotA power efficiency of 8.79$\mu$W, beating general purpose low
power hardware (a microcontroller) by 24x and special purpose ASICs by 16x.
Related papers
- Energy Efficient Hardware Acceleration of Neural Networks with
Power-of-Two Quantisation [0.0]
We show that a hardware neural network accelerator with PoT weights implemented on the Zynq UltraScale + MPSoC ZCU104 FPGA can be at least $1.4x$ more energy efficient than the uniform quantisation version.
arXiv Detail & Related papers (2022-09-30T06:33:40Z) - Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and
Algorithm Co-design [66.39546326221176]
Attention-based neural networks have become pervasive in many AI tasks.
The use of the attention mechanism and feed-forward network (FFN) demands excessive computational and memory resources.
This paper proposes a hardware-friendly variant that adopts a unified butterfly sparsity pattern to approximate both the attention mechanism and the FFNs.
arXiv Detail & Related papers (2022-09-20T09:28:26Z) - Braille Letter Reading: A Benchmark for Spatio-Temporal Pattern
Recognition on Neuromorphic Hardware [50.380319968947035]
Recent deep learning approaches have reached accuracy in such tasks, but their implementation on conventional embedded solutions is still computationally very and energy expensive.
We propose a new benchmark for computing tactile pattern recognition at the edge through letters reading.
We trained and compared feed-forward and recurrent spiking neural networks (SNNs) offline using back-propagation through time with surrogate gradients, then we deployed them on the Intel Loihimorphic chip for efficient inference.
Our results show that the LSTM outperforms the recurrent SNN in terms of accuracy by 14%. However, the recurrent SNN on Loihi is 237 times more energy
arXiv Detail & Related papers (2022-05-30T14:30:45Z) - A Fast Network Exploration Strategy to Profile Low Energy Consumption
for Keyword Spotting [1.121535291831358]
Keywords spotting is an integral part of speech-oriented user interaction targeted for smart devices.
We propose a regression-based network exploration technique that considers the scaling of the network filters.
Our design is deployed on the Xilinx AC 701 platform and has at least 2.1$times$ and 4$times$ improvements on energy and energy efficiency results.
arXiv Detail & Related papers (2022-02-04T19:51:41Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - Sub-mW Keyword Spotting on an MCU: Analog Binary Feature Extraction and
Binary Neural Networks [19.40893986868577]
Keywords spotting (KWS) is a crucial function enabling the interaction with the many ubiquitous smart devices in our surroundings.
This work addresses KWS energy-efficiency on low-cost microcontroller units (MCUs)
By replacing the digital preprocessing with the proposed analog front-end, we show that the energy required for data acquisition and preprocessing can be reduced by 29x.
arXiv Detail & Related papers (2022-01-10T15:10:58Z) - Ultra-Low Power Keyword Spotting at the Edge [0.0]
Keywords spotting (KWS) has become an indispensable part of many intelligent devices surrounding us.
In this work, we design an optimized KWS CNN model by considering end-to-end energy efficiency for the deployment at MAX78000.
With the combined hardware and model optimization approach, we achieve 96.3% accuracy for 12 classes while only consuming 251 uJ per inference.
arXiv Detail & Related papers (2021-11-09T08:24:36Z) - Quantization and Deployment of Deep Neural Networks on Microcontrollers [0.0]
This work focuses on quantization and deployment of deep neural networks onto low-power 32-bit microcontrollers.
A new framework for end-to-end deep neural networks training, quantization and deployment is presented.
Execution using single precision 32-bit floating-point as well as fixed-point on 8- and 16-bit integers are supported.
arXiv Detail & Related papers (2021-05-27T17:39:06Z) - AdderNet and its Minimalist Hardware Design for Energy-Efficient
Artificial Intelligence [111.09105910265154]
We present a novel minimalist hardware architecture using adder convolutional neural network (AdderNet)
The whole AdderNet can practically achieve 16% enhancement in speed.
We conclude the AdderNet is able to surpass all the other competitors.
arXiv Detail & Related papers (2021-01-25T11:31:52Z) - SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and
Training [82.35376405568975]
Deep neural networks (DNNs) come with heavy parameterization, leading to external dynamic random-access memory (DRAM) for storage.
We present SmartDeal (SD), an algorithm framework to trade higher-cost memory storage/access for lower-cost computation.
We show that SD leads to 10.56x and 4.48x reduction in the storage and training energy, with negligible accuracy loss compared to state-of-the-art training baselines.
arXiv Detail & Related papers (2021-01-04T18:54:07Z) - ShiftAddNet: A Hardware-Inspired Deep Network [87.18216601210763]
ShiftAddNet is an energy-efficient multiplication-less deep neural network.
It leads to both energy-efficient inference and training, without compromising expressive capacity.
ShiftAddNet aggressively reduces over 80% hardware-quantified energy cost of DNNs training and inference, while offering comparable or better accuracies.
arXiv Detail & Related papers (2020-10-24T05:09:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.