Related papers: Data Generation Using Pass-phrase-dependent Deep Auto-encoders for Text-Dependent Speaker Verification

Data Generation Using Pass-phrase-dependent Deep Auto-encoders for Text-Dependent Speaker Verification

URL: http://arxiv.org/abs/2102.02074v1
Date: Wed, 3 Feb 2021 14:06:29 GMT
Title: Data Generation Using Pass-phrase-dependent Deep Auto-encoders for Text-Dependent Speaker Verification
Authors: Achintya Kumar Sarkar, Md Sahidullah, Zheng-Hua Tan
Abstract summary: We propose a novel method that trains pass-phrase specific deep neural network (PP-DNN) based auto-encoders for creating augmented data for text-dependent speaker verification (TD-SV) Each PP-DNN auto-encoder is trained using the utterances of a particular pass-phrase available in the target enrollment set. Experiments are conducted on the RedDots challenge 2016 database for TD-SV using short utterances.
Score: 25.318439244029094
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose a novel method that trains pass-phrase specific deep neural network (PP-DNN) based auto-encoders for creating augmented data for text-dependent speaker verification (TD-SV). Each PP-DNN auto-encoder is trained using the utterances of a particular pass-phrase available in the target enrollment set with two methods: (i) transfer learning and (ii) training from scratch. Next, feature vectors of a given utterance are fed to the PP-DNNs and the output from each PP-DNN at frame-level is considered one new set of generated data. The generated data from each PP-DNN is then used for building a TD-SV system in contrast to the conventional method that considers only the evaluation data available. The proposed approach can be considered as the transformation of data to the pass-phrase specific space using a non-linear transformation learned by each PP-DNN. The method develops several TD-SV systems with the number equal to the number of PP-DNNs separately trained for each pass-phrases for the evaluation. Finally, the scores of the different TD-SV systems are fused for decision making. Experiments are conducted on the RedDots challenge 2016 database for TD-SV using short utterances. Results show that the proposed method improves the performance for both conventional cepstral feature and deep bottleneck feature using both Gaussian mixture model - universal background model (GMM-UBM) and i-vector framework.

Related papers

FLIP: Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction [49.510163437116645]
Click-through rate (CTR) prediction plays as a core function module in personalized online services. Traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality. Pretrained Language Models(PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality. We propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models(FLIP) for CTR prediction.
arXiv Detail & Related papers (2023-10-30T11:25:03Z)
A stacked deep convolutional neural network to predict the remaining useful life of a turbofan engine [0.0]
The solution is based on two Deep Convolutional Neural Networks stacked in two levels. The proposed methodology was ranked in the third place of the 2021 PHM Conference Data Challenge.
arXiv Detail & Related papers (2021-11-24T18:36:28Z)
On Addressing Practical Challenges for RNN-Transduce [72.72132048437751]
We adapt a well-trained RNN-T model to a new domain without collecting the audio data. We obtain word-level confidence scores by utilizing several types of features calculated during decoding. The proposed time stamping method can get less than 50ms word timing difference on average.
arXiv Detail & Related papers (2021-04-27T23:31:43Z)
Vocal Tract Length Perturbation for Text-Dependent Speaker Verification with Autoregressive Prediction Coding [0.0]
We propose a vocal tract length (VTL) perturbation method for text-dependent speaker verification (TD-SV) A set of TD-SV systems are trained, one for each VTL factor, and score-level fusion is applied to make a final decision.
arXiv Detail & Related papers (2020-11-25T06:11:06Z)
Multitask Learning and Joint Optimization for Transformer-RNN-Transducer Speech Recognition [13.198689566654107]
This paper explores multitask learning, joint optimization, and joint decoding methods for transformer-RNN-transducer systems. We show that the proposed methods can reduce word error rate (WER) by 16.6 % and 13.3 % for test-clean and test-other datasets, respectively.
arXiv Detail & Related papers (2020-11-02T06:38:06Z)
Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting. Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking. We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
Diversifying Task-oriented Dialogue Response Generation with Prototype Guided Paraphrasing [52.71007876803418]
Existing methods for Dialogue Response Generation (DRG) in Task-oriented Dialogue Systems ( TDSs) can be grouped into two categories: template-based and corpus-based. We propose a prototype-based, paraphrasing neural network, called P2-Net, which aims to enhance quality of the responses in terms of both precision and diversity.
arXiv Detail & Related papers (2020-08-07T22:25:36Z)
Defense against Adversarial Attacks in NLP via Dirichlet Neighborhood Ensemble [163.3333439344695]
Dirichlet Neighborhood Ensemble (DNE) is a randomized smoothing method for training a robust model to defense substitution-based attacks. DNE forms virtual sentences by sampling embedding vectors for each word in an input sentence from a convex hull spanned by the word and its synonyms, and it augments them with the training data. We demonstrate through extensive experimentation that our method consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.
arXiv Detail & Related papers (2020-06-20T18:01:16Z)
Recent Developments Combining Ensemble Smoother and Deep Generative Networks for Facies History Matching [58.720142291102135]
This research project focuses on the use of autoencoders networks to construct a continuous parameterization for facies models. We benchmark seven different formulations, including VAE, generative adversarial network (GAN), Wasserstein GAN, variational auto-encoding GAN, principal component analysis (PCA) with cycle GAN, PCA with transfer style network and VAE with style loss.
arXiv Detail & Related papers (2020-05-08T21:32:42Z)
A Novel Deep Learning Architecture for Decoding Imagined Speech from EEG [2.4063592468412267]
We present a novel architecture that employs deep neural network (DNN) for classifying the words "in" and "cooperate" Nine EEG channels, which best capture the underlying cortical activity, are chosen using common spatial pattern. We have achieved accuracies comparable to the state-of-the-art results.
arXiv Detail & Related papers (2020-03-19T00:57:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.