EfficientTDNN: Efficient Architecture Search for Speaker Recognition in
the Wild
- URL: http://arxiv.org/abs/2103.13581v1
- Date: Thu, 25 Mar 2021 03:28:07 GMT
- Title: EfficientTDNN: Efficient Architecture Search for Speaker Recognition in
the Wild
- Authors: Rui Wang, Zhihua Wei, Shouling Ji, and Zhen Hong
- Abstract summary: We propose a neural architecture search-based efficient time-delay neural network (EfficientTDNN) to improve inference efficiency while maintaining recognition accuracy.
Experiments on the VoxCeleb dataset show EfficientTDNN provides a huge search space including approximately $1013$s and achieves 1.66% EER and 0.156 DCF$_0.01$ with 565M MACs.
- Score: 29.59228560095565
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speaker recognition refers to audio biometrics that utilizes acoustic
characteristics for automatic speaker recognition. These systems have emerged
as an essential means of verifying identity in various scenarios, such as smart
homes, general business interactions, e-commerce applications, and forensics.
However, the mismatch between training and real-world data causes a shift of
speaker embedding space and severely degrades the recognition performance.
Various complicated neural architectures are presented to address speaker
recognition in the wild but neglect the requirements of storage and
computation. To address this issue, we propose a neural architecture
search-based efficient time-delay neural network (EfficientTDNN) to improve
inference efficiency while maintaining recognition accuracy. The proposed
EfficientTDNN contains three phases. First, supernet design is to construct a
dynamic neural architecture that consists of sequential cells and enables
network pruning. Second, progressive training is to optimize randomly sampled
subnets that inherit the weights of the supernet. Third, three search methods,
including manual grid search, random search, and model predictive evolutionary
search, are introduced to find a trade-off between accuracy and efficiency.
Results of experiments on the VoxCeleb dataset show EfficientTDNN provides a
huge search space including approximately $10^{13}$ subnets and achieves 1.66%
EER and 0.156 DCF$_{0.01}$ with 565M MACs. Comprehensive investigation suggests
that the trained supernet generalizes cells unseen during training and obtains
an acceptable balance between accuracy and efficiency.
Related papers
- Explainable Cost-Sensitive Deep Neural Networks for Brain Tumor
Detection from Brain MRI Images considering Data Imbalance [0.0]
An automated pipeline is proposed, which encompasses five models: CNN, ResNet50, InceptionV3, EfficientNetB0 and NASNetMobile.
The performance of the proposed architecture is evaluated on a balanced dataset and found to yield an accuracy of 99.33% for fine-tuned InceptionV3 model.
To further optimize the training process, a cost-sensitive neural network approach has been proposed in order to work with imbalanced datasets.
arXiv Detail & Related papers (2023-08-01T15:35:06Z) - Human Activity Recognition on Microcontrollers with Quantized and
Adaptive Deep Neural Networks [10.195581493173643]
Human Activity Recognition (HAR) based on inertial data is an increasingly diffused task on embedded devices.
Most embedded HAR systems are based on simple and not-so-accurate classic machine learning algorithms.
This work proposes a set of efficient one-dimensional Convolutional Neural Networks (CNNs) deployable on general purpose microcontrollers (MCUs)
arXiv Detail & Related papers (2022-09-02T06:32:11Z) - Automated Atrial Fibrillation Classification Based on Denoising Stacked
Autoencoder and Optimized Deep Network [1.7403133838762446]
The incidences of atrial fibrillation (AFib) are increasing at a daunting rate worldwide.
For the early detection of the risk of AFib, we have developed an automatic detection system based on deep neural networks.
An end-to-end model is proposed to denoise the electrocardiogram signals using denoising autoencoders (DAE)
arXiv Detail & Related papers (2022-01-26T21:45:48Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z) - Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks [61.76338096980383]
A range of neural architecture search (NAS) techniques are used to automatically learn two types of hyper- parameters of state-of-the-art factored time delay neural networks (TDNNs)
These include the DARTS method integrating architecture selection with lattice-free MMI (LF-MMI) TDNN training.
Experiments conducted on a 300-hour Switchboard corpus suggest the auto-configured systems consistently outperform the baseline LF-MMI TDNN systems.
arXiv Detail & Related papers (2020-07-17T08:32:11Z) - Multi-Tones' Phase Coding (MTPC) of Interaural Time Difference by
Spiking Neural Network [68.43026108936029]
We propose a pure spiking neural network (SNN) based computational model for precise sound localization in the noisy real-world environment.
We implement this algorithm in a real-time robotic system with a microphone array.
The experiment results show a mean error azimuth of 13 degrees, which surpasses the accuracy of the other biologically plausible neuromorphic approach for sound source localization.
arXiv Detail & Related papers (2020-07-07T08:22:56Z) - FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining [65.39532971991778]
We present an accuracy predictor that scores architecture and training recipes jointly, guiding both sample selection and ranking.
We run fast evolutionary searches in just CPU minutes to generate architecture-recipe pairs for a variety of resource constraints.
FBNetV3 makes up a family of state-of-the-art compact neural networks that outperform both automatically and manually-designed competitors.
arXiv Detail & Related papers (2020-06-03T05:20:21Z) - AutoSpeech: Neural Architecture Search for Speaker Recognition [108.69505815793028]
We propose the first neural architecture search approach approach for the speaker recognition tasks, named as AutoSpeech.
Our algorithm first identifies the optimal operation combination in a neural cell and then derives a CNN model by stacking the neural cell for multiple times.
Results demonstrate that the derived CNN architectures significantly outperform current speaker recognition systems based on VGG-M, ResNet-18, and ResNet-34 back-bones, while enjoying lower model complexity.
arXiv Detail & Related papers (2020-05-07T02:53:47Z) - Deep Speaker Embeddings for Far-Field Speaker Recognition on Short
Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions.
Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks.
This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.