Related papers: TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids

TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids

URL: http://arxiv.org/abs/2005.11138v1
Date: Wed, 20 May 2020 20:37:47 GMT
Title: TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids
Authors: Igor Fedorov, Marko Stamenovic, Carl Jensen, Li-Chia Yang, Ari Mandell, Yiming Gan, Matthew Mattina, Paul N. Whatmough
Abstract summary: We use model compression techniques to bridge the gap between large neural networks and battery powered hearing aid hardware. We are the first to demonstrate their efficacy for RNN speech enhancement, using pruning and integer quantization of weights/activations. Our model achieves a computational latency of 2.39ms, well within the 10ms target and 351$times$ better than previous work.
Score: 13.369813069254132
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern speech enhancement algorithms achieve remarkable noise suppression by means of large recurrent neural networks (RNNs). However, large RNNs limit practical deployment in hearing aid hardware (HW) form-factors, which are battery powered and run on resource-constrained microcontroller units (MCUs) with limited memory capacity and compute capability. In this work, we use model compression techniques to bridge this gap. We define the constraints imposed on the RNN by the HW and describe a method to satisfy them. Although model compression techniques are an active area of research, we are the first to demonstrate their efficacy for RNN speech enhancement, using pruning and integer quantization of weights/activations. We also demonstrate state update skipping, which reduces the computational load. Finally, we conduct a perceptual evaluation of the compressed models to verify audio quality on human raters. Results show a reduction in model size and operations of 11.9$\times$ and 2.9$\times$, respectively, over the baseline for compressed models, without a statistical difference in listening preference and only exhibiting a loss of 0.55dB SDR. Our model achieves a computational latency of 2.39ms, well within the 10ms target and 351$\times$ better than previous work.

Related papers

GhostRNN: Reducing State Redundancy in RNN with Cheap Operations [66.14054138609355]
We propose an efficient RNN architecture, GhostRNN, which reduces hidden state redundancy with cheap operations. Experiments on KWS and SE tasks demonstrate that the proposed GhostRNN significantly reduces the memory usage (40%) and computation cost while keeping performance similar.
arXiv Detail & Related papers (2024-11-20T11:37:14Z)
Real-time Speech Enhancement on Raw Signals with Deep State-space Modeling [1.0650780147044159]
We present aTENNuate, a simple deep state-space autoencoder configured for efficient online raw speech enhancement. We benchmark aTENNuate on the VoiceBank + DEMAND and the Microsoft DNS1 synthetic test sets. The network outperforms previous real-time denoising models in terms of PESQ score, parameter count, MACs, and latency.
arXiv Detail & Related papers (2024-09-05T09:28:56Z)
Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing [50.79602839359522]
We propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module. We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH) In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
arXiv Detail & Related papers (2023-09-29T13:09:40Z)
Cross-Model Comparative Loss for Enhancing Neuronal Utility in Language Understanding [82.46024259137823]
We propose a cross-model comparative loss for a broad range of tasks. We demonstrate the universal effectiveness of comparative loss through extensive experiments on 14 datasets from 3 distinct NLU tasks.
arXiv Detail & Related papers (2023-01-10T03:04:27Z)
Simple Pooling Front-ends For Efficient Audio Classification [56.59107110017436]
We show that eliminating the temporal redundancy in the input audio features could be an effective approach for efficient audio classification. We propose a family of simple pooling front-ends (SimPFs) which use simple non-parametric pooling operations to reduce the redundant information. SimPFs can achieve a reduction in more than half of the number of floating point operations for off-the-shelf audio neural networks.
arXiv Detail & Related papers (2022-10-03T14:00:41Z)
Hybrid Neural Networks for On-device Directional Hearing [15.109811993590037]
DeepBeam is a hybrid model that combines traditional beamformers with a custom lightweight neural net. Our real-time hybrid model runs in 8 ms on mobile CPUs designed for low-power wearable devices and achieves an end-to-end latency of 17.5 ms.
arXiv Detail & Related papers (2021-12-11T01:29:12Z)
Tiny-CRNN: Streaming Wakeword Detection In A Low Footprint Setting [14.833049700174307]
We propose Tiny-CRNN (Tiny Convolutional Recurrent Neural Network) models applied to the problem of wakeword detection. We find that, compared to Convolutional Neural Network models, False Accepts in a 250k parameter budget can be reduced by 25% with a 10% reduction in parameter size.
arXiv Detail & Related papers (2021-09-29T21:12:14Z)
Spatio-Temporal Pruning and Quantization for Low-latency Spiking Neural Networks [6.011954485684313]
Spiking Neural Networks (SNNs) are a promising alternative to traditional deep learning methods. However, a major drawback of SNNs is high inference latency. In this paper, we propose spatial and temporal pruning of SNNs.
arXiv Detail & Related papers (2021-04-26T12:50:58Z)
Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition [65.7040645560855]
We propose Q-ASR, an integer-only, zero-shot quantization scheme for ASR models. We show negligible WER change as compared to the full-precision baseline models. Q-ASR exhibits a large compression rate of more than 4x with small WER degradation.
arXiv Detail & Related papers (2021-03-31T06:05:40Z)
Deliberation Model Based Two-Pass End-to-End Speech Recognition [52.45841282906516]
A two-pass model has been proposed to rescore streamed hypotheses using the non-streaming Listen, Attend and Spell (LAS) model. The model attends to acoustics to rescore hypotheses, as opposed to a class of neural correction models that use only first-pass text hypotheses. A bidirectional encoder is used to extract context information from first-pass hypotheses.
arXiv Detail & Related papers (2020-03-17T22:01:12Z)
REST: Robust and Efficient Neural Networks for Sleep Monitoring in the Wild [62.36144064259933]
We propose REST, a new method that simultaneously tackles both issues via adversarial training and controlling the Lipschitz constant of the neural network. We demonstrate that REST produces highly-robust and efficient models that substantially outperform the original full-sized models in the presence of noise. By deploying these models to an Android application on a smartphone, we quantitatively observe that REST allows models to achieve up to 17x energy reduction and 9x faster inference.
arXiv Detail & Related papers (2020-01-29T17:23:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.