DropClass and DropAdapt: Dropping classes for deep speaker
representation learning
- URL: http://arxiv.org/abs/2002.00453v1
- Date: Sun, 2 Feb 2020 18:43:50 GMT
- Title: DropClass and DropAdapt: Dropping classes for deep speaker
representation learning
- Authors: Chau Luu, Peter Bell, Steve Renals
- Abstract summary: This work proposes two approaches to learning embeddings, based on the notion of dropping classes during training.
We demonstrate that both approaches can yield performance gains in speaker verification tasks.
- Score: 33.60058873783114
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many recent works on deep speaker embeddings train their feature extraction
networks on large classification tasks, distinguishing between all speakers in
a training set. Empirically, this has been shown to produce
speaker-discriminative embeddings, even for unseen speakers. However, it is not
clear that this is the optimal means of training embeddings that generalize
well. This work proposes two approaches to learning embeddings, based on the
notion of dropping classes during training. We demonstrate that both approaches
can yield performance gains in speaker verification tasks. The first proposed
method, DropClass, works via periodically dropping a random subset of classes
from the training data and the output layer throughout training, resulting in a
feature extractor trained on many different classification tasks. Combined with
an additive angular margin loss, this method can yield a 7.9% relative
improvement in equal error rate (EER) over a strong baseline on VoxCeleb. The
second proposed method, DropAdapt, is a means of adapting a trained model to a
set of enrolment speakers in an unsupervised manner. This is performed by
fine-tuning a model on only those classes which produce high probability
predictions when the enrolment speakers are used as input, again also dropping
the relevant rows from the output layer. This method yields a large 13.2%
relative improvement in EER on VoxCeleb. The code for this paper has been made
publicly available.
Related papers
- DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification [55.306583814017046]
We present a novel difficulty-aware semantic augmentation (DASA) approach for speaker verification.
DASA generates diversified training samples in speaker embedding space with negligible extra computing cost.
The best result achieves a 14.6% relative reduction in EER metric on CN-Celeb evaluation set.
arXiv Detail & Related papers (2023-10-18T17:07:05Z) - Prefer to Classify: Improving Text Classifiers via Auxiliary Preference
Learning [76.43827771613127]
In this paper, we investigate task-specific preferences between pairs of input texts as a new alternative way for such auxiliary data annotation.
We propose a novel multi-task learning framework, called prefer-to-classify (P2C), which can enjoy the cooperative effect of learning both the given classification task and the auxiliary preferences.
arXiv Detail & Related papers (2023-06-08T04:04:47Z) - Collar-aware Training for Streaming Speaker Change Detection in
Broadcast Speech [0.0]
We present a novel training method for speaker change detection models.
The proposed method uses an objective function which encourages the model to predict a single positive label within a specified collar.
arXiv Detail & Related papers (2022-05-14T15:35:43Z) - Self-supervised Speaker Diarization [19.111219197011355]
This study proposes an entirely unsupervised deep-learning model for speaker diarization.
Speaker embeddings are represented by an encoder trained in a self-supervised fashion using pairs of adjacent segments assumed to be of the same speaker.
arXiv Detail & Related papers (2022-04-08T16:27:14Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - SPIRAL: Self-supervised Perturbation-Invariant Representation Learning
for Speech Pre-Training [25.80559992732508]
SPIRAL works by learning denoising representation of perturbed data in a teacher-student framework.
We address the problem of noise-robustness that is critical to real-world speech applications.
arXiv Detail & Related papers (2022-01-25T09:53:36Z) - Prototypical Classifier for Robust Class-Imbalanced Learning [64.96088324684683]
We propose textitPrototypical, which does not require fitting additional parameters given the embedding network.
Prototypical produces balanced and comparable predictions for all classes even though the training set is class-imbalanced.
We test our method on CIFAR-10LT, CIFAR-100LT and Webvision datasets, observing that Prototypical obtains substaintial improvements compared with state of the arts.
arXiv Detail & Related papers (2021-10-22T01:55:01Z) - Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring [60.55025339250815]
We propose a novel deep learning technique for non-native ASS, called speaker-conditioned hierarchical modeling.
We take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. In our technique, we take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. We extract context from these responses and feed them as additional speaker-specific context to our network to score a particular response.
arXiv Detail & Related papers (2021-08-30T07:00:28Z) - End-to-End Speaker Diarization as Post-Processing [64.12519350944572]
Clustering-based diarization methods partition frames into clusters of the number of speakers.
Some end-to-end diarization methods can handle overlapping speech by treating the problem as multi-label classification.
We propose to use a two-speaker end-to-end diarization method as post-processing of the results obtained by a clustering-based method.
arXiv Detail & Related papers (2020-12-18T05:31:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.