Related papers: Tiny-CRNN: Streaming Wakeword Detection In A Low Footprint Setting

Tiny-CRNN: Streaming Wakeword Detection In A Low Footprint Setting

URL: http://arxiv.org/abs/2109.14725v1
Date: Wed, 29 Sep 2021 21:12:14 GMT
Title: Tiny-CRNN: Streaming Wakeword Detection In A Low Footprint Setting
Authors: Mohammad Omar Khursheed, Christin Jose, Rajath Kumar, Gengshen Fu, Brian Kulis, Santosh Kumar Cheekatmalla
Abstract summary: We propose Tiny-CRNN (Tiny Convolutional Recurrent Neural Network) models applied to the problem of wakeword detection. We find that, compared to Convolutional Neural Network models, False Accepts in a 250k parameter budget can be reduced by 25% with a 10% reduction in parameter size.
Score: 14.833049700174307
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we propose Tiny-CRNN (Tiny Convolutional Recurrent Neural Network) models applied to the problem of wakeword detection, and augment them with scaled dot product attention. We find that, compared to Convolutional Neural Network models, False Accepts in a 250k parameter budget can be reduced by 25% with a 10% reduction in parameter size by using models based on the Tiny-CRNN architecture, and we can get up to 32% reduction in False Accepts at a 50k parameter budget with 75% reduction in parameter size compared to word-level Dense Neural Network models. We discuss solutions to the challenging problem of performing inference on streaming audio with this architecture, as well as differences in start-end index errors and latency in comparison to CNN, DNN, and DNN-HMM models.

Related papers

Graph Neural Network for Accurate and Low-complexity SAR ATR [2.9766397696234996]
We propose a graph neural network (GNN) model to achieve accurate and low-latency SAR ATR. The proposed GNN model has low computation complexity and achieves comparable high accuracy. Compared with the state-of-the-art CNNs, the proposed GNN model has only 1/3000 computation cost and 1/80 model size.
arXiv Detail & Related papers (2023-05-11T20:17:41Z)
Attention-based Feature Compression for CNN Inference Offloading in Edge Computing [93.67044879636093]
This paper studies the computational offloading of CNN inference in device-edge co-inference systems. We propose a novel autoencoder-based CNN architecture (AECNN) for effective feature extraction at end-device. Experiments show that AECNN can compress the intermediate data by more than 256x with only about 4% accuracy loss.
arXiv Detail & Related papers (2022-11-24T18:10:01Z)
Accelerating Large Scale Real-Time GNN Inference using Channel Pruning [7.8799581908375185]
Graph Neural Networks (GNNs) are proven to be powerful models to generate node embedding for downstream applications. However, due to the high computation complexity of GNN inference, it is hard to deploy GNNs for large-scale or real-time applications. We propose to accelerate GNN inference by pruning the dimensions in each layer with negligible accuracy loss.
arXiv Detail & Related papers (2021-05-10T17:28:44Z)
ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware. The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation. We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z)
Deep Networks for Direction-of-Arrival Estimation in Low SNR [89.45026632977456]
We introduce a Convolutional Neural Network (CNN) that is trained from mutli-channel data of the true array manifold matrix. We train a CNN in the low-SNR regime to predict DoAs across all SNRs. Our robust solution can be applied in several fields, ranging from wireless array sensors to acoustic microphones or sonars.
arXiv Detail & Related papers (2020-11-17T12:52:18Z)
Dynamically Throttleable Neural Networks (TNN) [24.052859278938858]
Conditional computation for Deep Neural Networks (DNNs) reduce overall computational load and improve model accuracy by running a subset of the network. We present a runtime throttleable neural network (TNN) that can adaptively self-regulate its own performance target and computing resources.
arXiv Detail & Related papers (2020-11-01T20:17:42Z)
Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement [53.47564132861866]
We find that a hybrid architecture, namely CNN-TT, is capable of maintaining a good quality performance with a reduced model parameter size. CNN-TT is composed of several convolutional layers at the bottom for feature extraction to improve speech quality.
arXiv Detail & Related papers (2020-07-25T22:21:05Z)
TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids [13.369813069254132]
We use model compression techniques to bridge the gap between large neural networks and battery powered hearing aid hardware. We are the first to demonstrate their efficacy for RNN speech enhancement, using pruning and integer quantization of weights/activations. Our model achieves a computational latency of 2.39ms, well within the 10ms target and 351$times$ better than previous work.
arXiv Detail & Related papers (2020-05-20T20:37:47Z)
Tensor-to-Vector Regression for Multi-channel Speech Enhancement based on Tensor-Train Network [53.47564132861866]
We propose a tensor-to-vector regression approach to multi-channel speech enhancement. The key idea is to cast the conventional deep neural network (DNN) based vector-to-vector regression formulation under a tensor-train network (TTN) framework. In 8-channel conditions, a PESQ of 3.12 is achieved using 20 million parameters for TTN, whereas a DNN with 68 million parameters can only attain a PESQ of 3.06.
arXiv Detail & Related papers (2020-02-03T02:58:00Z)
Approximation and Non-parametric Estimation of ResNet-type Convolutional Neural Networks [52.972605601174955]
We show a ResNet-type CNN can attain the minimax optimal error rates in important function classes. We derive approximation and estimation error rates of the aformentioned type of CNNs for the Barron and H"older classes.
arXiv Detail & Related papers (2019-03-24T19:42:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.