Crossed-Time Delay Neural Network for Speaker Recognition
- URL: http://arxiv.org/abs/2006.00452v3
- Date: Tue, 7 Dec 2021 06:22:23 GMT
- Title: Crossed-Time Delay Neural Network for Speaker Recognition
- Authors: Liang Chen and Yanchun Liang and Xiaohu Shi and You Zhou and Chunguo
Wu
- Abstract summary: We introduce a novel structure Crossed-Time Delay Neural Network (CTDNN) to enhance the performance of current TDNN.
The proposed CTDNN gives significant improvements over original TDNN on both speaker verification and identification tasks.
- Score: 5.216353911330589
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Time Delay Neural Network (TDNN) is a well-performing structure for DNN-based
speaker recognition systems. In this paper we introduce a novel structure
Crossed-Time Delay Neural Network (CTDNN) to enhance the performance of current
TDNN. Inspired by the multi-filters setting of convolution layer from
convolution neural network, we set multiple time delay units each with
different context size at the bottom layer and construct a multilayer parallel
network. The proposed CTDNN gives significant improvements over original TDNN
on both speaker verification and identification tasks. It outperforms in
VoxCeleb1 dataset in verification experiment with a 2.6% absolute Equal Error
Rate improvement. In few shots condition CTDNN reaches 90.4% identification
accuracy, which doubles the identification accuracy of original TDNN. We also
compare the proposed CTDNN with another new variant of TDNN, FTDNN, which shows
that our model has a 36% absolute identification accuracy improvement under few
shots condition and can better handle training of a larger batch in a shorter
training time, which better utilize the calculation resources. The code of the
new model is released at https://github.com/chenllliang/CTDNN
Related papers
- A noise based novel strategy for faster SNN training [0.0]
Spiking neural networks (SNNs) are receiving increasing attention due to their low power consumption and strong bio-plausibility.
Two main methods, artificial neural network (ANN)-to-SNN conversion and spike-based backpropagation (BP), both have their advantages and limitations.
We propose a novel SNN training approach that combines the benefits of the two methods.
arXiv Detail & Related papers (2022-11-10T09:59:04Z) - SNN2ANN: A Fast and Memory-Efficient Training Framework for Spiking
Neural Networks [117.56823277328803]
Spiking neural networks are efficient computation models for low-power environments.
We propose a SNN-to-ANN (SNN2ANN) framework to train the SNN in a fast and memory-efficient way.
Experiment results show that our SNN2ANN-based models perform well on the benchmark datasets.
arXiv Detail & Related papers (2022-06-19T16:52:56Z) - Training High-Performance Low-Latency Spiking Neural Networks by
Differentiation on Spike Representation [70.75043144299168]
Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware.
It is a challenge to efficiently train SNNs due to their non-differentiability.
We propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance.
arXiv Detail & Related papers (2022-05-01T12:44:49Z) - Exploiting Low-Rank Tensor-Train Deep Neural Networks Based on
Riemannian Gradient Descent With Illustrations of Speech Processing [74.31472195046099]
We exploit a low-rank tensor-train deep neural network (TT-DNN) to build an end-to-end deep learning pipeline, namely LR-TT-DNN.
A hybrid model combining LR-TT-DNN with a convolutional neural network (CNN) is set up to boost the performance.
Our empirical evidence demonstrates that the LR-TT-DNN and CNN+(LR-TT-DNN) models with fewer model parameters can outperform the TT-DNN and CNN+(LR-TT-DNN) counterparts.
arXiv Detail & Related papers (2022-03-11T15:55:34Z) - Can Deep Neural Networks be Converted to Ultra Low-Latency Spiking
Neural Networks? [3.2108350580418166]
Spiking neural networks (SNNs) operate via binary spikes distributed over time.
SOTA training strategies for SNNs involve conversion from a non-spiking deep neural network (DNN)
We propose a new training algorithm that accurately captures these distributions, minimizing the error between the DNN and converted SNN.
arXiv Detail & Related papers (2021-12-22T18:47:45Z) - SpeechNAS: Towards Better Trade-off between Latency and Accuracy for
Large-Scale Speaker Verification [26.028985033942735]
In this work, we try to identify the optimal architectures from a TDNN based search space employing neural architecture search (NAS)
Our derived best neural network achieves an equal error rate (EER) of 1.02% on the standard test set of VoxCeleb1, which surpasses previous TDNN based state-of-the-art approaches by a large margin.
arXiv Detail & Related papers (2021-09-18T05:31:27Z) - Strengthening the Training of Convolutional Neural Networks By Using
Walsh Matrix [0.0]
We have modified the training and structure of DNN to increase the classification performance.
A minimum distance network (MDN) following the last layer of the convolutional neural network (CNN) is used as the classifier.
In different areas, it has been observed that a higher classification performance was obtained by using the DivFE with less number of nodes.
arXiv Detail & Related papers (2021-03-31T18:06:11Z) - Deep Time Delay Neural Network for Speech Enhancement with Full Data
Learning [60.20150317299749]
This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning.
To make full use of the training data, we propose a full data learning method for speech enhancement.
arXiv Detail & Related papers (2020-11-11T06:32:37Z) - Kernel Based Progressive Distillation for Adder Neural Networks [71.731127378807]
Adder Neural Networks (ANNs) which only contain additions bring us a new way of developing deep neural networks with low energy consumption.
There is an accuracy drop when replacing all convolution filters by adder filters.
We present a novel method for further improving the performance of ANNs without increasing the trainable parameters.
arXiv Detail & Related papers (2020-09-28T03:29:19Z) - FATNN: Fast and Accurate Ternary Neural Networks [89.07796377047619]
Ternary Neural Networks (TNNs) have received much attention due to being potentially orders of magnitude faster in inference, as well as more power efficient, than full-precision counterparts.
In this work, we show that, under some mild constraints, computational complexity of the ternary inner product can be reduced by a factor of 2.
We elaborately design an implementation-dependent ternary quantization algorithm to mitigate the performance gap.
arXiv Detail & Related papers (2020-08-12T04:26:18Z) - Depthwise Separable Convolutions Versus Recurrent Neural Networks for
Monaural Singing Voice Separation [17.358040670413505]
We focus on singing voice separation, employing an RNN architecture, and we replace the RNNs with DWS convolutions (DWS-CNNs)
We conduct an ablation study and examine the effect of the number of channels and layers of DWS-CNNs on the source separation performance.
Our results show that by replacing RNNs with DWS-CNNs yields an improvement of 1.20, 0.06, 0.37 dB, respectively, while using only 20.57% of the amount of parameters of the RNN architecture.
arXiv Detail & Related papers (2020-07-06T12:32:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.