Neural Architecture Search For Keyword Spotting
- URL: http://arxiv.org/abs/2009.00165v2
- Date: Wed, 2 Sep 2020 04:10:58 GMT
- Title: Neural Architecture Search For Keyword Spotting
- Authors: Tong Mo, Yakun Yu, Mohammad Salameh, Di Niu, Shangling Jui
- Abstract summary: We apply neural architecture search to search for convolutional neural network models.
We achieve a state-of-the-art accuracy of over 97% on the setting of 12-class utterance classification.
- Score: 18.253449041632166
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks have recently become a popular solution to keyword
spotting systems, which enable the control of smart devices via voice. In this
paper, we apply neural architecture search to search for convolutional neural
network models that can help boost the performance of keyword spotting based on
features extracted from acoustic signals while maintaining an acceptable memory
footprint. Specifically, we use differentiable architecture search techniques
to search for operators and their connections in a predefined cell search
space. The found cells are then scaled up in both depth and width to achieve
competitive performance. We evaluated the proposed method on Google's Speech
Commands Dataset and achieved a state-of-the-art accuracy of over 97% on the
setting of 12-class utterance classification commonly reported in the
literature.
Related papers
- EM-DARTS: Hierarchical Differentiable Architecture Search for Eye Movement Recognition [54.99121380536659]
Eye movement biometrics have received increasing attention thanks to its high secure identification.
Deep learning (DL) models have been recently successfully applied for eye movement recognition.
DL architecture still is determined by human prior knowledge.
We propose EM-DARTS, a hierarchical differentiable architecture search algorithm to automatically design the DL architecture for eye movement recognition.
arXiv Detail & Related papers (2024-09-22T13:11:08Z) - Encoder-Decoder Neural Architecture Optimization for Keyword Spotting [4.419022795297077]
Keywords spotting aims to identify specific keyword audio utterances.
Deep convolutional neural networks have been widely utilized in keyword spotting systems.
In this paper, we utilize neural architecture search to design convolutional neural network models that can boost the performance of keyword spotting.
arXiv Detail & Related papers (2021-06-04T22:09:05Z) - Firefly Neural Architecture Descent: a General Approach for Growing
Neural Networks [50.684661759340145]
Firefly neural architecture descent is a general framework for progressively and dynamically growing neural networks.
We show that firefly descent can flexibly grow networks both wider and deeper, and can be applied to learn accurate but resource-efficient neural architectures.
In particular, it learns networks that are smaller in size but have higher average accuracy than those learned by the state-of-the-art methods.
arXiv Detail & Related papers (2021-02-17T04:47:18Z) - Towards Searching Efficient and Accurate Neural Network Architectures in
Binary Classification Problems [4.3871352596331255]
In this study, we optimize the selection process by investigating different search algorithms to find a neural network architecture size that yields the highest accuracy.
We apply binary search on a very well-defined binary classification network search space and compare the results to those of linear search.
We report a 100-fold running time improvement over the naive linear search when we apply the binary search method to our datasets.
arXiv Detail & Related papers (2021-01-16T20:00:38Z) - Speech Command Recognition in Computationally Constrained Environments
with a Quadratic Self-organized Operational Layer [92.37382674655942]
We propose a network layer to enhance the speech command recognition capability of a lightweight network.
The employed method borrows the ideas of Taylor expansion and quadratic forms to construct a better representation of features in both input and hidden layers.
This richer representation results in recognition accuracy improvement as shown by extensive experiments on Google speech commands (GSC) and synthetic speech commands (SSC) datasets.
arXiv Detail & Related papers (2020-11-23T14:40:18Z) - Task-Aware Neural Architecture Search [33.11791812491669]
We propose a novel framework for neural architecture search, utilizing a dictionary of models of base tasks and the similarity between the target task and the atoms of the dictionary.
By introducing a gradient-based search algorithm, we can evaluate and discover the best architecture in the search space without fully training the networks.
arXiv Detail & Related papers (2020-10-27T00:10:40Z) - NAS-Navigator: Visual Steering for Explainable One-Shot Deep Neural
Network Synthesis [53.106414896248246]
We present a framework that allows analysts to effectively build the solution sub-graph space and guide the network search by injecting their domain knowledge.
Applying this technique in an iterative manner allows analysts to converge to the best performing neural network architecture for a given application.
arXiv Detail & Related papers (2020-09-28T01:48:45Z) - VINNAS: Variational Inference-based Neural Network Architecture Search [2.685668802278155]
We present a differentiable variational inference-based NAS method for searching sparse convolutional neural networks.
Our method finds diverse network cells, while showing state-of-the-art accuracy with up to almost 2 times fewer non-zero parameters.
arXiv Detail & Related papers (2020-07-12T21:47:35Z) - AutoSpeech: Neural Architecture Search for Speaker Recognition [108.69505815793028]
We propose the first neural architecture search approach approach for the speaker recognition tasks, named as AutoSpeech.
Our algorithm first identifies the optimal operation combination in a neural cell and then derives a CNN model by stacking the neural cell for multiple times.
Results demonstrate that the derived CNN architectures significantly outperform current speaker recognition systems based on VGG-M, ResNet-18, and ResNet-34 back-bones, while enjoying lower model complexity.
arXiv Detail & Related papers (2020-05-07T02:53:47Z) - Deep Speaker Embeddings for Far-Field Speaker Recognition on Short
Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions.
Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks.
This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z) - AudioMNIST: Exploring Explainable Artificial Intelligence for Audio
Analysis on a Simple Benchmark [12.034688724153044]
This paper explores post-hoc explanations for deep neural networks in the audio domain.
We present a novel Open Source audio dataset consisting of 30,000 audio samples of English spoken digits.
We demonstrate the superior interpretability of audible explanations over visual ones in a human user study.
arXiv Detail & Related papers (2018-07-09T23:11:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.