DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification
- URL: http://arxiv.org/abs/2310.12111v1
- Date: Wed, 18 Oct 2023 17:07:05 GMT
- Title: DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification
- Authors: Yuanyuan Wang, Yang Zhang, Zhiyong Wu, Zhihan Yang, Tao Wei, Kun Zou,
Helen Meng
- Abstract summary: We present a novel difficulty-aware semantic augmentation (DASA) approach for speaker verification.
DASA generates diversified training samples in speaker embedding space with negligible extra computing cost.
The best result achieves a 14.6% relative reduction in EER metric on CN-Celeb evaluation set.
- Score: 55.306583814017046
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data augmentation is vital to the generalization ability and robustness of
deep neural networks (DNNs) models. Existing augmentation methods for speaker
verification manipulate the raw signal, which are time-consuming and the
augmented samples lack diversity. In this paper, we present a novel
difficulty-aware semantic augmentation (DASA) approach for speaker
verification, which can generate diversified training samples in speaker
embedding space with negligible extra computing cost. Firstly, we augment
training samples by perturbing speaker embeddings along semantic directions,
which are obtained from speaker-wise covariance matrices. Secondly, accurate
covariance matrices are estimated from robust speaker embeddings during
training, so we introduce difficultyaware additive margin softmax
(DAAM-Softmax) to obtain optimal speaker embeddings. Finally, we assume the
number of augmented samples goes to infinity and derive a closed-form upper
bound of the expected loss with DASA, which achieves compatibility and
efficiency. Extensive experiments demonstrate the proposed approach can achieve
a remarkable performance improvement. The best result achieves a 14.6% relative
reduction in EER metric on CN-Celeb evaluation set.
Related papers
- HiddenSpeaker: Generate Imperceptible Unlearnable Audios for Speaker Verification System [0.9591674293850556]
We propose a framework named HiddenSpeaker, embedding imperceptible perturbations within the training speech samples.
Our results demonstrate that HiddenSpeaker not only deceives the model with unlearnable samples but also enhances the imperceptibility of the perturbations.
arXiv Detail & Related papers (2024-05-24T15:49:00Z) - ROPO: Robust Preference Optimization for Large Language Models [59.10763211091664]
We propose an iterative alignment approach that integrates noise-tolerance and filtering of noisy samples without the aid of external models.
Experiments on three widely-used datasets with Mistral-7B and Llama-2-7B demonstrate that ROPO significantly outperforms existing preference alignment methods.
arXiv Detail & Related papers (2024-04-05T13:58:51Z) - Inference Stage Denoising for Undersampled MRI Reconstruction [13.8086726938161]
Reconstruction of magnetic resonance imaging (MRI) data has been positively affected by deep learning.
A key challenge remains: to improve generalisation to distribution shifts between the training and testing data.
arXiv Detail & Related papers (2024-02-12T12:50:10Z) - Adversarial Data Augmentation for Robust Speaker Verification [17.40709301417885]
This paper proposes a novel approach called adversarial data augmentation (A-DA)
It involves an additional augmentation classifier to categorize various augmentation types used in data augmentation.
Experiments conducted on VoxCeleb and CN-Celeb datasets demonstrate that our proposed A-DA outperforms standard DA in both augmentation matched and mismatched test conditions.
arXiv Detail & Related papers (2024-02-05T03:23:34Z) - Improving the Robustness of Summarization Systems with Dual Augmentation [68.53139002203118]
A robust summarization system should be able to capture the gist of the document, regardless of the specific word choices or noise in the input.
We first explore the summarization models' robustness against perturbations including word-level synonym substitution and noise.
We propose a SummAttacker, which is an efficient approach to generating adversarial samples based on language models.
arXiv Detail & Related papers (2023-06-01T19:04:17Z) - Implicit Counterfactual Data Augmentation for Robust Learning [24.795542869249154]
This study proposes an Implicit Counterfactual Data Augmentation method to remove spurious correlations and make stable predictions.
Experiments have been conducted across various biased learning scenarios covering both image and text datasets.
arXiv Detail & Related papers (2023-04-26T10:36:40Z) - TOLD: A Novel Two-Stage Overlap-Aware Framework for Speaker Diarization [54.41494515178297]
We reformulate speaker diarization as a single-label classification problem.
We propose the overlap-aware EEND (EEND-OLA) model, in which speaker overlaps and dependency can be modeled explicitly.
Compared with the original EEND, the proposed EEND-OLA achieves a 14.39% relative improvement in terms of diarization error rates.
arXiv Detail & Related papers (2023-03-08T05:05:26Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - Bayesian Learning for Deep Neural Network Adaptation [57.70991105736059]
A key task for speech recognition systems is to reduce the mismatch between training and evaluation data that is often attributable to speaker differences.
Model-based speaker adaptation approaches often require sufficient amounts of target speaker data to ensure robustness.
This paper proposes a full Bayesian learning based DNN speaker adaptation framework to model speaker-dependent (SD) parameter uncertainty.
arXiv Detail & Related papers (2020-12-14T12:30:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.