Related papers: TinySpeech: Attention Condensers for Deep Speech Recognition Neural Networks on Edge Devices

TinySpeech: Attention Condensers for Deep Speech Recognition Neural Networks on Edge Devices

URL: http://arxiv.org/abs/2008.04245v6
Date: Mon, 12 Oct 2020 19:07:39 GMT
Title: TinySpeech: Attention Condensers for Deep Speech Recognition Neural Networks on Edge Devices
Authors: Alexander Wong, Mahmoud Famouri, Maya Pavlova, and Siddharth Surana
Abstract summary: We introduce the concept of attention condensers for building low-footprint, highly-efficient deep neural networks for on-device speech recognition on the edge. To illustrate its efficacy, we introduce TinySpeech, low-precision deep neural networks tailored for on-device speech recognition.
Score: 71.68436132514542
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Advances in deep learning have led to state-of-the-art performance across a multitude of speech recognition tasks. Nevertheless, the widespread deployment of deep neural networks for on-device speech recognition remains a challenge, particularly in edge scenarios where the memory and computing resources are highly constrained (e.g., low-power embedded devices) or where the memory and computing budget dedicated to speech recognition is low (e.g., mobile devices performing numerous tasks besides speech recognition). In this study, we introduce the concept of attention condensers for building low-footprint, highly-efficient deep neural networks for on-device speech recognition on the edge. An attention condenser is a self-attention mechanism that learns and produces a condensed embedding characterizing joint local and cross-channel activation relationships, and performs selective attention accordingly. To illustrate its efficacy, we introduce TinySpeech, low-precision deep neural networks comprising largely of attention condensers tailored for on-device speech recognition using a machine-driven design exploration strategy, with one tailored specifically with microcontroller operation constraints. Experimental results on the Google Speech Commands benchmark dataset for limited-vocabulary speech recognition showed that TinySpeech networks achieved significantly lower architectural complexity (as much as $507\times$ fewer parameters), lower computational complexity (as much as $48\times$ fewer multiply-add operations), and lower storage requirements (as much as $2028\times$ lower weight memory requirements) when compared to previous work. These results not only demonstrate the efficacy of attention condensers for building highly efficient networks for on-device speech recognition, but also illuminate its potential for accelerating deep learning on the edge and empowering TinyML applications.

Related papers

Deep Photonic Reservoir Computer for Speech Recognition [49.1574468325115]
Speech recognition is a critical task in the field of artificial intelligence and has witnessed remarkable advancements. Deep reservoir computing is energy efficient but exhibits limitations in performance when compared to more resource-intensive machine learning algorithms. We propose a photonic-based deep reservoir computer and evaluate its effectiveness on different speech recognition tasks.
arXiv Detail & Related papers (2023-12-11T17:43:58Z)
Event Based Time-Vectors for auditory features extraction: a neuromorphic approach for low power audio recognition [4.206844212918807]
We present a neuromorphic architecture, capable of unsupervised auditory feature recognition. We then validate the network on a subset of Google's Speech Commands dataset.
arXiv Detail & Related papers (2021-12-13T21:08:04Z)
AttendSeg: A Tiny Attention Condenser Neural Network for Semantic Segmentation on the Edge [71.80459780697956]
We introduce textbfAttendSeg, a low-precision, highly compact deep neural network tailored for on-device semantic segmentation. AttendSeg possesses a self-attention network architecture comprising of light-weight attention condensers for improved spatial-channel selective attention.
arXiv Detail & Related papers (2021-04-29T19:19:04Z)
Binary Neural Network for Speaker Verification [13.472791713805762]
This paper focuses on how to apply binary neural networks to the task of speaker verification. Experiment results show that, after binarizing the Convolutional Neural Network, the ResNet34-based network achieves an EER of around 5%.
arXiv Detail & Related papers (2021-04-06T06:04:57Z)
Speech Command Recognition in Computationally Constrained Environments with a Quadratic Self-organized Operational Layer [92.37382674655942]
We propose a network layer to enhance the speech command recognition capability of a lightweight network. The employed method borrows the ideas of Taylor expansion and quadratic forms to construct a better representation of features in both input and hidden layers. This richer representation results in recognition accuracy improvement as shown by extensive experiments on Google speech commands (GSC) and synthetic speech commands (SSC) datasets.
arXiv Detail & Related papers (2020-11-23T14:40:18Z)
AttendNets: Tiny Deep Image Recognition Neural Networks for the Edge via Visual Attention Condensers [81.17461895644003]
We introduce AttendNets, low-precision, highly compact deep neural networks tailored for on-device image recognition. AttendNets possess deep self-attention architectures based on visual attention condensers. Results show AttendNets have significantly lower architectural and computational complexity when compared to several deep neural networks.
arXiv Detail & Related papers (2020-09-30T01:53:17Z)
Resource-Efficient Speech Mask Estimation for Multi-Channel Speech Enhancement [15.361841669377776]
We provide a resource-efficient approach for multi-channel speech enhancement based on Deep Neural Networks (DNNs) In particular, we use reduced-precision DNNs for estimating a speech mask from noisy, multi-channel microphone observations. In the extreme case of binary weights and reduced precision activations, a significant reduction of execution time and memory footprint is possible.
arXiv Detail & Related papers (2020-07-22T14:58:29Z)
Deep Speaker Embeddings for Far-Field Speaker Recognition on Short Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions. Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks. This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.