TinySpeech: Attention Condensers for Deep Speech Recognition Neural
Networks on Edge Devices
- URL: http://arxiv.org/abs/2008.04245v6
- Date: Mon, 12 Oct 2020 19:07:39 GMT
- Title: TinySpeech: Attention Condensers for Deep Speech Recognition Neural
Networks on Edge Devices
- Authors: Alexander Wong, Mahmoud Famouri, Maya Pavlova, and Siddharth Surana
- Abstract summary: We introduce the concept of attention condensers for building low-footprint, highly-efficient deep neural networks for on-device speech recognition on the edge.
To illustrate its efficacy, we introduce TinySpeech, low-precision deep neural networks tailored for on-device speech recognition.
- Score: 71.68436132514542
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Advances in deep learning have led to state-of-the-art performance across a
multitude of speech recognition tasks. Nevertheless, the widespread deployment
of deep neural networks for on-device speech recognition remains a challenge,
particularly in edge scenarios where the memory and computing resources are
highly constrained (e.g., low-power embedded devices) or where the memory and
computing budget dedicated to speech recognition is low (e.g., mobile devices
performing numerous tasks besides speech recognition). In this study, we
introduce the concept of attention condensers for building low-footprint,
highly-efficient deep neural networks for on-device speech recognition on the
edge. An attention condenser is a self-attention mechanism that learns and
produces a condensed embedding characterizing joint local and cross-channel
activation relationships, and performs selective attention accordingly. To
illustrate its efficacy, we introduce TinySpeech, low-precision deep neural
networks comprising largely of attention condensers tailored for on-device
speech recognition using a machine-driven design exploration strategy, with one
tailored specifically with microcontroller operation constraints. Experimental
results on the Google Speech Commands benchmark dataset for limited-vocabulary
speech recognition showed that TinySpeech networks achieved significantly lower
architectural complexity (as much as $507\times$ fewer parameters), lower
computational complexity (as much as $48\times$ fewer multiply-add operations),
and lower storage requirements (as much as $2028\times$ lower weight memory
requirements) when compared to previous work. These results not only
demonstrate the efficacy of attention condensers for building highly efficient
networks for on-device speech recognition, but also illuminate its potential
for accelerating deep learning on the edge and empowering TinyML applications.
Related papers
- Deep Photonic Reservoir Computer for Speech Recognition [49.1574468325115]
Speech recognition is a critical task in the field of artificial intelligence and has witnessed remarkable advancements.
Deep reservoir computing is energy efficient but exhibits limitations in performance when compared to more resource-intensive machine learning algorithms.
We propose a photonic-based deep reservoir computer and evaluate its effectiveness on different speech recognition tasks.
arXiv Detail & Related papers (2023-12-11T17:43:58Z) - Event Based Time-Vectors for auditory features extraction: a
neuromorphic approach for low power audio recognition [4.206844212918807]
We present a neuromorphic architecture, capable of unsupervised auditory feature recognition.
We then validate the network on a subset of Google's Speech Commands dataset.
arXiv Detail & Related papers (2021-12-13T21:08:04Z) - AttendSeg: A Tiny Attention Condenser Neural Network for Semantic
Segmentation on the Edge [71.80459780697956]
We introduce textbfAttendSeg, a low-precision, highly compact deep neural network tailored for on-device semantic segmentation.
AttendSeg possesses a self-attention network architecture comprising of light-weight attention condensers for improved spatial-channel selective attention.
arXiv Detail & Related papers (2021-04-29T19:19:04Z) - Binary Neural Network for Speaker Verification [13.472791713805762]
This paper focuses on how to apply binary neural networks to the task of speaker verification.
Experiment results show that, after binarizing the Convolutional Neural Network, the ResNet34-based network achieves an EER of around 5%.
arXiv Detail & Related papers (2021-04-06T06:04:57Z) - Speech Command Recognition in Computationally Constrained Environments
with a Quadratic Self-organized Operational Layer [92.37382674655942]
We propose a network layer to enhance the speech command recognition capability of a lightweight network.
The employed method borrows the ideas of Taylor expansion and quadratic forms to construct a better representation of features in both input and hidden layers.
This richer representation results in recognition accuracy improvement as shown by extensive experiments on Google speech commands (GSC) and synthetic speech commands (SSC) datasets.
arXiv Detail & Related papers (2020-11-23T14:40:18Z) - AttendNets: Tiny Deep Image Recognition Neural Networks for the Edge via
Visual Attention Condensers [81.17461895644003]
We introduce AttendNets, low-precision, highly compact deep neural networks tailored for on-device image recognition.
AttendNets possess deep self-attention architectures based on visual attention condensers.
Results show AttendNets have significantly lower architectural and computational complexity when compared to several deep neural networks.
arXiv Detail & Related papers (2020-09-30T01:53:17Z) - Resource-Efficient Speech Mask Estimation for Multi-Channel Speech
Enhancement [15.361841669377776]
We provide a resource-efficient approach for multi-channel speech enhancement based on Deep Neural Networks (DNNs)
In particular, we use reduced-precision DNNs for estimating a speech mask from noisy, multi-channel microphone observations.
In the extreme case of binary weights and reduced precision activations, a significant reduction of execution time and memory footprint is possible.
arXiv Detail & Related papers (2020-07-22T14:58:29Z) - Deep Speaker Embeddings for Far-Field Speaker Recognition on Short
Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions.
Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks.
This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.