Improving Embedding Extraction for Speaker Verification with Ladder
Network
- URL: http://arxiv.org/abs/2003.09125v1
- Date: Fri, 20 Mar 2020 07:08:38 GMT
- Title: Improving Embedding Extraction for Speaker Verification with Ladder
Network
- Authors: Fei Tao and Gokhan Tur
- Abstract summary: Recent speaker verification (SV) systems rely on deep neural networks to extract high-level embeddings.
We propose to apply the ladder network framework in the SV systems, which combines the supervised and unsupervised learning fashions.
The proposed approach relatively improved the performance by 10% at most without adding parameters and augmented data.
- Score: 8.843122009658252
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speaker verification is an established yet challenging task in speech
processing and a very vibrant research area. Recent speaker verification (SV)
systems rely on deep neural networks to extract high-level embeddings which are
able to characterize the users' voices. Most of the studies have investigated
on improving the discriminability of the networks to extract better embeddings
for performances improvement. However, only few research focus on improving the
generalization. In this paper, we propose to apply the ladder network framework
in the SV systems, which combines the supervised and unsupervised learning
fashions. The ladder network can make the system to have better high-level
embedding by balancing the trade-off to keep/discard as much useful/useless
information as possible. We evaluated the framework on two state-of-the-art SV
systems, d-vector and x-vector, which can be used for different use cases. The
experiments showed that the proposed approach relatively improved the
performance by 10% at most without adding parameters and augmented data.
Related papers
- Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity.
Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z) - DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification [55.306583814017046]
We present a novel difficulty-aware semantic augmentation (DASA) approach for speaker verification.
DASA generates diversified training samples in speaker embedding space with negligible extra computing cost.
The best result achieves a 14.6% relative reduction in EER metric on CN-Celeb evaluation set.
arXiv Detail & Related papers (2023-10-18T17:07:05Z) - Box-based Refinement for Weakly Supervised and Unsupervised Localization
Tasks [57.70351255180495]
We train the detectors on top of the network output instead of the image data and apply suitable loss backpropagation.
Our findings reveal a significant improvement in phrase grounding for the what is where by looking'' task.
arXiv Detail & Related papers (2023-09-07T17:36:02Z) - Joint Speech Activity and Overlap Detection with Multi-Exit Architecture [5.4878772986187565]
Overlapped speech detection (OSD) is critical for speech applications in scenario of multi-party conversion.
This study investigates the joint VAD and OSD task from a new perspective.
In particular, we propose to extend traditional classification network with multi-exit architecture.
arXiv Detail & Related papers (2022-09-24T02:34:11Z) - STC speaker recognition systems for the NIST SRE 2021 [56.05258832139496]
This paper presents a description of STC Ltd. systems submitted to the NIST 2021 Speaker Recognition Evaluation.
These systems consists of a number of diverse subsystems based on using deep neural networks as feature extractors.
For video modality we developed our best solution with RetinaFace face detector and deep ResNet face embeddings extractor trained on large face image datasets.
arXiv Detail & Related papers (2021-11-03T15:31:01Z) - Efficient Attention Branch Network with Combined Loss Function for
Automatic Speaker Verification Spoof Detection [7.219077740523682]
Models currently deployed for the task of Automatic Speaker Verification are, at their best, devoid of suitable degrees of generalization to unseen attacks.
The present study proposes the Efficient Attention Branch Network (EABN) modular architecture with a combined loss function to address the generalization problem.
arXiv Detail & Related papers (2021-09-05T12:10:16Z) - On the role of feedback in visual processing: a predictive coding
perspective [0.6193838300896449]
We consider deep convolutional networks (CNNs) as models of feed-forward visual processing and implement Predictive Coding (PC) dynamics.
We find that the network increasingly relies on top-down predictions as the noise level increases.
In addition, the accuracy of the network implementing PC dynamics significantly increases over time-steps, compared to its equivalent forward network.
arXiv Detail & Related papers (2021-06-08T10:07:23Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Deep Speaker Embeddings for Far-Field Speaker Recognition on Short
Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions.
Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks.
This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.