Data augmentation versus noise compensation for x- vector speaker
recognition systems in noisy environments
- URL: http://arxiv.org/abs/2006.15903v1
- Date: Mon, 29 Jun 2020 09:50:45 GMT
- Title: Data augmentation versus noise compensation for x- vector speaker
recognition systems in noisy environments
- Authors: Mohammad Mohammadamini (LIA), Driss Matrouf (LIA)
- Abstract summary: We show that explicit noise compensation gives almost the same EER relative gain in two protocols.
For example, in the Protocol2 we have 21% to 66% improvement of EER with denoising techniques.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The explosion of available speech data and new speaker modeling methods based
on deep neural networks (DNN) have given the ability to develop more robust
speaker recognition systems. Among DNN speaker modelling techniques, x-vector
system has shown a degree of robustness in noisy environments. Previous studies
suggest that by increasing the number of speakers in the training data and
using data augmentation more robust speaker recognition systems are achievable
in noisy environments. In this work, we want to know if explicit noise
compensation techniques continue to be effective despite the general noise
robustness of these systems. For this study, we will use two different x-vector
networks: the first one is trained on Voxceleb1 (Protocol1), and the second one
is trained on Voxceleb1+Voxveleb2 (Protocol2). We propose to add a denoising
x-vector subsystem before scoring. Experimental results show that, the x-vector
system used in Protocol2 is more robust than the other one used Protocol1.
Despite this observation we will show that explicit noise compensation gives
almost the same EER relative gain in both protocols. For example, in the
Protocol2 we have 21% to 66% improvement of EER with denoising techniques.
Related papers
- DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification [55.306583814017046]
We present a novel difficulty-aware semantic augmentation (DASA) approach for speaker verification.
DASA generates diversified training samples in speaker embedding space with negligible extra computing cost.
The best result achieves a 14.6% relative reduction in EER metric on CN-Celeb evaluation set.
arXiv Detail & Related papers (2023-10-18T17:07:05Z) - Fully Automated End-to-End Fake Audio Detection [57.78459588263812]
This paper proposes a fully automated end-toend fake audio detection method.
We first use wav2vec pre-trained model to obtain a high-level representation of the speech.
For the network structure, we use a modified version of the differentiable architecture search (DARTS) named light-DARTS.
arXiv Detail & Related papers (2022-08-20T06:46:55Z) - Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For
Disordered Speech Recognition [57.15942628305797]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems for normal speech.
This paper presents a cross-domain acoustic-to-articulatory (A2A) inversion approach that utilizes the parallel acoustic-articulatory data of the 15-hour TORGO corpus in model training.
Cross-domain adapted to the 102.7-hour UASpeech corpus and to produce articulatory features.
arXiv Detail & Related papers (2022-03-19T08:47:18Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - Improving Noise Robustness of Contrastive Speech Representation Learning
with Speech Reconstruction [109.44933866397123]
Noise robustness is essential for deploying automatic speech recognition systems in real-world environments.
We employ a noise-robust representation learned by a refined self-supervised framework for noisy speech recognition.
We achieve comparable performance to the best supervised approach reported with only 16% of labeled data.
arXiv Detail & Related papers (2021-10-28T20:39:02Z) - Streaming Multi-speaker ASR with RNN-T [8.701566919381223]
This work focuses on multi-speaker speech recognition based on a recurrent neural network transducer (RNN-T)
We show that guiding separation with speaker order labels in the former case enhances the high-level speaker tracking capability of RNN-T.
Our best model achieves a WER of 10.2% on simulated 2-speaker Libri data, which is competitive with the previously reported state-of-the-art nonstreaming model (10.3%)
arXiv Detail & Related papers (2020-11-23T19:10:40Z) - Combination of Deep Speaker Embeddings for Diarisation [9.053645441056256]
This paper proposes a c-vector method by combining multiple sets of complementary d-vectors derived from systems with different NN components.
A neural-based single-pass speaker diarisation pipeline is also proposed in this paper.
Experiments and detailed analyses are conducted on the challenging AMI and NIST RT05 datasets.
arXiv Detail & Related papers (2020-10-22T20:16:36Z) - Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes [36.63589873242547]
Multi-speaker speech synthesis is a technique for modeling multiple speakers' voices with a single model.
We propose a framework for multi-speaker speech synthesis using deep Gaussian processes (DGPs) and latent variable models (DGPLVMs)
arXiv Detail & Related papers (2020-08-07T02:03:27Z) - DNN Speaker Tracking with Embeddings [0.0]
We propose a novel embedding-based speaker tracking method.
Our design is based on a convolutional neural network that mimics a typical speaker verification PLDA.
To make the baseline system similar to speaker tracking, non-target speakers were added to the recordings.
arXiv Detail & Related papers (2020-07-13T18:40:14Z) - Deep Speaker Embeddings for Far-Field Speaker Recognition on Short
Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions.
Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks.
This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z) - Robust Speaker Recognition Using Speech Enhancement And Attention Model [37.33388614967888]
Instead of individually processing speech enhancement and speaker recognition, the two modules are integrated into one framework by a joint optimisation using deep neural networks.
To increase robustness against noise, a multi-stage attention mechanism is employed to highlight the speaker related features learned from context information in time and frequency domain.
The obtained results show that the proposed approach using speech enhancement and multi-stage attention models outperforms two strong baselines not using them in most acoustic conditions in our experiments.
arXiv Detail & Related papers (2020-01-14T20:03:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.