Data Generation Using Pass-phrase-dependent Deep Auto-encoders for
Text-Dependent Speaker Verification
- URL: http://arxiv.org/abs/2102.02074v1
- Date: Wed, 3 Feb 2021 14:06:29 GMT
- Title: Data Generation Using Pass-phrase-dependent Deep Auto-encoders for
Text-Dependent Speaker Verification
- Authors: Achintya Kumar Sarkar, Md Sahidullah, Zheng-Hua Tan
- Abstract summary: We propose a novel method that trains pass-phrase specific deep neural network (PP-DNN) based auto-encoders for creating augmented data for text-dependent speaker verification (TD-SV)
Each PP-DNN auto-encoder is trained using the utterances of a particular pass-phrase available in the target enrollment set.
Experiments are conducted on the RedDots challenge 2016 database for TD-SV using short utterances.
- Score: 25.318439244029094
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a novel method that trains pass-phrase specific
deep neural network (PP-DNN) based auto-encoders for creating augmented data
for text-dependent speaker verification (TD-SV). Each PP-DNN auto-encoder is
trained using the utterances of a particular pass-phrase available in the
target enrollment set with two methods: (i) transfer learning and (ii) training
from scratch. Next, feature vectors of a given utterance are fed to the PP-DNNs
and the output from each PP-DNN at frame-level is considered one new set of
generated data. The generated data from each PP-DNN is then used for building a
TD-SV system in contrast to the conventional method that considers only the
evaluation data available. The proposed approach can be considered as the
transformation of data to the pass-phrase specific space using a non-linear
transformation learned by each PP-DNN. The method develops several TD-SV
systems with the number equal to the number of PP-DNNs separately trained for
each pass-phrases for the evaluation. Finally, the scores of the different
TD-SV systems are fused for decision making. Experiments are conducted on the
RedDots challenge 2016 database for TD-SV using short utterances. Results show
that the proposed method improves the performance for both conventional
cepstral feature and deep bottleneck feature using both Gaussian mixture model
- universal background model (GMM-UBM) and i-vector framework.
Related papers
- FLIP: Towards Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction [49.510163437116645]
We propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models (FLIP) for click-through rate (CTR) prediction.
Specifically, the masked data of one modality (i.e., tokens or features) has to be recovered with the help of the other modality, which establishes the feature-level interaction and alignment.
Experiments on three real-world datasets demonstrate that FLIP outperforms SOTA baselines, and is highly compatible for various ID-based models and PLMs.
arXiv Detail & Related papers (2023-10-30T11:25:03Z) - A stacked deep convolutional neural network to predict the remaining
useful life of a turbofan engine [0.0]
The solution is based on two Deep Convolutional Neural Networks stacked in two levels.
The proposed methodology was ranked in the third place of the 2021 PHM Conference Data Challenge.
arXiv Detail & Related papers (2021-11-24T18:36:28Z) - On Addressing Practical Challenges for RNN-Transduce [72.72132048437751]
We adapt a well-trained RNN-T model to a new domain without collecting the audio data.
We obtain word-level confidence scores by utilizing several types of features calculated during decoding.
The proposed time stamping method can get less than 50ms word timing difference on average.
arXiv Detail & Related papers (2021-04-27T23:31:43Z) - Vocal Tract Length Perturbation for Text-Dependent Speaker Verification
with Autoregressive Prediction Coding [0.0]
We propose a vocal tract length (VTL) perturbation method for text-dependent speaker verification (TD-SV)
A set of TD-SV systems are trained, one for each VTL factor, and score-level fusion is applied to make a final decision.
arXiv Detail & Related papers (2020-11-25T06:11:06Z) - Multitask Learning and Joint Optimization for Transformer-RNN-Transducer
Speech Recognition [13.198689566654107]
This paper explores multitask learning, joint optimization, and joint decoding methods for transformer-RNN-transducer systems.
We show that the proposed methods can reduce word error rate (WER) by 16.6 % and 13.3 % for test-clean and test-other datasets, respectively.
arXiv Detail & Related papers (2020-11-02T06:38:06Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Diversifying Task-oriented Dialogue Response Generation with Prototype
Guided Paraphrasing [52.71007876803418]
Existing methods for Dialogue Response Generation (DRG) in Task-oriented Dialogue Systems ( TDSs) can be grouped into two categories: template-based and corpus-based.
We propose a prototype-based, paraphrasing neural network, called P2-Net, which aims to enhance quality of the responses in terms of both precision and diversity.
arXiv Detail & Related papers (2020-08-07T22:25:36Z) - Defense against Adversarial Attacks in NLP via Dirichlet Neighborhood
Ensemble [163.3333439344695]
Dirichlet Neighborhood Ensemble (DNE) is a randomized smoothing method for training a robust model to defense substitution-based attacks.
DNE forms virtual sentences by sampling embedding vectors for each word in an input sentence from a convex hull spanned by the word and its synonyms, and it augments them with the training data.
We demonstrate through extensive experimentation that our method consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.
arXiv Detail & Related papers (2020-06-20T18:01:16Z) - Recent Developments Combining Ensemble Smoother and Deep Generative
Networks for Facies History Matching [58.720142291102135]
This research project focuses on the use of autoencoders networks to construct a continuous parameterization for facies models.
We benchmark seven different formulations, including VAE, generative adversarial network (GAN), Wasserstein GAN, variational auto-encoding GAN, principal component analysis (PCA) with cycle GAN, PCA with transfer style network and VAE with style loss.
arXiv Detail & Related papers (2020-05-08T21:32:42Z) - A Novel Deep Learning Architecture for Decoding Imagined Speech from EEG [2.4063592468412267]
We present a novel architecture that employs deep neural network (DNN) for classifying the words "in" and "cooperate"
Nine EEG channels, which best capture the underlying cortical activity, are chosen using common spatial pattern.
We have achieved accuracies comparable to the state-of-the-art results.
arXiv Detail & Related papers (2020-03-19T00:57:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.