Resource-efficient DNNs for Keyword Spotting using Neural Architecture
Search and Quantization
- URL: http://arxiv.org/abs/2012.10138v1
- Date: Fri, 18 Dec 2020 09:53:55 GMT
- Title: Resource-efficient DNNs for Keyword Spotting using Neural Architecture
Search and Quantization
- Authors: David Peter, Wolfgang Roth, Franz Pernkopf
- Abstract summary: This paper introduces neural architecture search (NAS) for the automatic discovery of small models for keyword spotting.
We employ a differentiable NAS approach to optimize the structure of convolutional neural networks (CNNs) to maximize the classification accuracy.
Using NAS only, we were able to obtain a highly efficient model with 95.4% accuracy on the Google speech commands dataset.
- Score: 23.850887499271842
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This paper introduces neural architecture search (NAS) for the automatic
discovery of small models for keyword spotting (KWS) in limited resource
environments. We employ a differentiable NAS approach to optimize the structure
of convolutional neural networks (CNNs) to maximize the classification accuracy
while minimizing the number of operations per inference. Using NAS only, we
were able to obtain a highly efficient model with 95.4% accuracy on the Google
speech commands dataset with 494.8 kB of memory usage and 19.6 million
operations. Additionally, weight quantization is used to reduce the memory
consumption even further. We show that weight quantization to low bit-widths
(e.g. 1 bit) can be used without substantial loss in accuracy. By increasing
the number of input features from 10 MFCC to 20 MFCC we were able to increase
the accuracy to 96.3% at 340.1 kB of memory usage and 27.1 million operations.
Related papers
- A Methodology for Improving Accuracy of Embedded Spiking Neural Networks through Kernel Size Scaling [6.006032394972252]
Spiking Neural Networks (SNNs) can offer ultra low power/ energy consumption for machine learning-based applications.
Currently, most of the SNN architectures need a significantly larger model size to achieve higher accuracy.
We propose a novel methodology that improves the accuracy of SNNs through kernel size scaling.
arXiv Detail & Related papers (2024-04-02T06:42:14Z) - Quantized Neural Networks for Low-Precision Accumulation with Guaranteed
Overflow Avoidance [68.8204255655161]
We introduce a quantization-aware training algorithm that guarantees avoiding numerical overflow when reducing the precision of accumulators during inference.
We evaluate our algorithm across multiple quantized models that we train for different tasks, showing that our approach can reduce the precision of accumulators while maintaining model accuracy with respect to a floating-point baseline.
arXiv Detail & Related papers (2023-01-31T02:46:57Z) - Variable Bitrate Neural Fields [75.24672452527795]
We present a dictionary method for compressing feature grids, reducing their memory consumption by up to 100x.
We formulate the dictionary optimization as a vector-quantized auto-decoder problem which lets us learn end-to-end discrete neural representations in a space where no direct supervision is available.
arXiv Detail & Related papers (2022-06-15T17:58:34Z) - MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs.
We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory.
We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - End-to-end Keyword Spotting using Neural Architecture Search and
Quantization [23.850887499271842]
This paper introduces neural architecture search (NAS) for the automatic discovery of end-to-end keyword spotting (KWS) models.
We employ a differentiable NAS approach to optimize the structure of convolutional neural networks (CNNs) operating on raw audio waveforms.
arXiv Detail & Related papers (2021-04-14T07:22:22Z) - Sound Event Detection with Binary Neural Networks on Tightly
Power-Constrained IoT Devices [20.349809458335532]
Sound event detection (SED) is a hot topic in consumer and smart city applications.
Existing approaches based on Deep Neural Networks are very effective, but highly demanding in terms of memory, power, and throughput.
In this paper, we explore the combination of extreme quantization to a small-print binary neural network (BNN) with the highly energy-efficient, RISC-V-based (8+1)-core GAP8 microcontroller.
arXiv Detail & Related papers (2021-01-12T12:38:23Z) - Learned Low Precision Graph Neural Networks [10.269500440688306]
We show how to systematically quantise Deep Graph Neural Networks (GNNs) with minimal or no loss in performance using Network Architecture Search (NAS)
The proposed novel NAS mechanism, named Low Precision Graph NAS (LPGNAS), constrains both architecture and quantisation choices to be differentiable.
On eight different datasets, solving the task of classifying unseen nodes in a graph, LPGNAS generates quantised models with significant reductions in both model and buffer sizes.
arXiv Detail & Related papers (2020-09-19T13:51:09Z) - Accuracy Prediction with Non-neural Model for Neural Architecture Search [185.0651567642238]
We study an alternative approach which uses non-neural model for accuracy prediction.
We leverage gradient boosting decision tree (GBDT) as the predictor for Neural architecture search (NAS)
Experiments on NASBench-101 and ImageNet demonstrate the effectiveness of using GBDT as predictor for NAS.
arXiv Detail & Related papers (2020-07-09T13:28:49Z) - Quantitative Analysis of Image Classification Techniques for
Memory-Constrained Devices [0.7373617024876725]
Convolutional Neural Networks, or CNNs, are the state of the art for image classification, but typically come at the cost of a large memory footprint.
In this paper, we compare CNNs with ProtoNN, Bonsai and FastGRNN when applied to 3-channel image classification using CIFAR-10.
We show that Direct Convolution CNNs perform best for all chosen budgets, with a top performance of 65.7% accuracy at a memory footprint of 58.23KB.
arXiv Detail & Related papers (2020-05-11T09:54:54Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.