Multiple-Instance, Cascaded Classification for Keyword Spotting in Narrow-Band Audio
- URL: http://arxiv.org/abs/1711.08058v2
- Date: Fri, 25 Apr 2025 04:47:20 GMT
- Title: Multiple-Instance, Cascaded Classification for Keyword Spotting in Narrow-Band Audio
- Authors: Ahmad AbdulKader, Kareem Nassar, Mohamed El-Geish, Daniel Galvez, Chetan Patil,
- Abstract summary: We propose using cascaded classifiers for a keyword spotting (KWS) task on narrow-band (NB), 8kHz audio acquired in non-IID environments.<n>We present a model that incorporates Deep Neural Networks (DNNs), cascading, multiple-feature representations, and multiple-instance learning.<n>The KWS system achieves a false negative rate of 6% at an hourly false positive rate of 0.75.
- Score: 0.4793481003355277
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose using cascaded classifiers for a keyword spotting (KWS) task on narrow-band (NB), 8kHz audio acquired in non-IID environments -- a more challenging task than most state-of-the-art KWS systems face. We present a model that incorporates Deep Neural Networks (DNNs), cascading, multiple-feature representations, and multiple-instance learning. The cascaded classifiers handle the task's class imbalance and reduce power consumption on computationally-constrained devices via early termination. The KWS system achieves a false negative rate of 6% at an hourly false positive rate of 0.75
Related papers
- Multi-class Network Intrusion Detection with Class Imbalance via LSTM & SMOTE [1.0591656257413806]
This paper proposes to use oversampling techniques along with appropriate loss functions to handle class imbalance for the detection of various types of network intrusions.
Our deep learning model employs LSTM with fully connected layers to perform multi-class classification of network attacks.
arXiv Detail & Related papers (2023-10-03T07:28:04Z) - Real-time Speaker counting in a cocktail party scenario using
Attention-guided Convolutional Neural Network [60.99112031408449]
We propose a real-time, single-channel attention-guided Convolutional Neural Network (CNN) to estimate the number of active speakers in overlapping speech.
The proposed system extracts higher-level information from the speech spectral content using a CNN model.
Experiments on simulated overlapping speech using WSJ corpus show that the attention solution is shown to improve the performance by almost 3% absolute over conventional temporal average pooling.
arXiv Detail & Related papers (2021-10-30T19:24:57Z) - Prototypical Classifier for Robust Class-Imbalanced Learning [64.96088324684683]
We propose textitPrototypical, which does not require fitting additional parameters given the embedding network.
Prototypical produces balanced and comparable predictions for all classes even though the training set is class-imbalanced.
We test our method on CIFAR-10LT, CIFAR-100LT and Webvision datasets, observing that Prototypical obtains substaintial improvements compared with state of the arts.
arXiv Detail & Related papers (2021-10-22T01:55:01Z) - A robust approach for deep neural networks in presence of label noise:
relabelling and filtering instances during training [14.244244290954084]
We propose a robust training strategy against label noise, called RAFNI, that can be used with any CNN.
RAFNI consists of three mechanisms: two mechanisms that filter instances and one mechanism that relabels instances.
We evaluated our algorithm using different data sets of several sizes and characteristics.
arXiv Detail & Related papers (2021-09-08T16:11:31Z) - Deep Networks for Direction-of-Arrival Estimation in Low SNR [89.45026632977456]
We introduce a Convolutional Neural Network (CNN) that is trained from mutli-channel data of the true array manifold matrix.
We train a CNN in the low-SNR regime to predict DoAs across all SNRs.
Our robust solution can be applied in several fields, ranging from wireless array sensors to acoustic microphones or sonars.
arXiv Detail & Related papers (2020-11-17T12:52:18Z) - Coresets for Robust Training of Neural Networks against Noisy Labels [78.03027938765746]
We propose a novel approach with strong theoretical guarantees for robust training of deep networks trained with noisy labels.
We select weighted subsets (coresets) of clean data points that provide an approximately low-rank Jacobian matrix.
Our experiments corroborate our theory and demonstrate that deep networks trained on our subsets achieve a significantly superior performance compared to state-of-the art.
arXiv Detail & Related papers (2020-11-15T04:58:11Z) - RethinkCWS: Is Chinese Word Segmentation a Solved Task? [81.11161697133095]
The performance of the Chinese Word (CWS) systems has gradually reached a plateau with the rapid development of deep neural networks.
In this paper, we take stock of what we have achieved and rethink what's left in the CWS task.
arXiv Detail & Related papers (2020-11-13T11:07:08Z) - Audio Spoofing Verification using Deep Convolutional Neural Networks by
Transfer Learning [0.0]
We propose a speech classifier based on deep-convolutional neural network to detect spoofing attacks.
Our proposed methodology uses acoustic time-frequency representation of power spectral densities on Mel frequency scale.
We have achieved an equal error rate (EER) of 0.9056% on the development and 5.32% on the evaluation dataset of logical access scenario.
arXiv Detail & Related papers (2020-08-08T07:14:40Z) - Device-Robust Acoustic Scene Classification Based on Two-Stage
Categorization and Data Augmentation [63.98724740606457]
We present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge.
Task 1a focuses on ASC of audio signals recorded with multiple (real and simulated) devices into ten different fine-grained classes.
Task 1b concerns with classification of data into three higher-level classes using low-complexity solutions.
arXiv Detail & Related papers (2020-07-16T15:07:14Z) - A Deep Neural Network for Audio Classification with a Classifier
Attention Mechanism [2.3204178451683264]
We introduce a new attention-based neural network architecture called Audio-Based Convolutional Neural Network (CAB-CNN)
The algorithm uses a newly designed architecture consisting of a list of simple classifiers and an attention mechanism as a selector.
Compared to the state-of-the-art algorithms, our algorithm achieves more than 10% improvements on all selected test scores.
arXiv Detail & Related papers (2020-06-14T21:29:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.