Bootstrap Equilibrium and Probabilistic Speaker Representation Learning
for Self-supervised Speaker Verification
- URL: http://arxiv.org/abs/2112.08929v1
- Date: Thu, 16 Dec 2021 14:55:44 GMT
- Title: Bootstrap Equilibrium and Probabilistic Speaker Representation Learning
for Self-supervised Speaker Verification
- Authors: Sung Hwan Mun, Min Hyun Han, Dongjune Lee, Jihwan Kim, and Nam Soo Kim
- Abstract summary: We propose self-supervised speaker representation learning strategies.
In the front-end, we learn the speaker representations via the bootstrap training scheme with the uniformity regularization term.
In the back-end, the probabilistic speaker embeddings are estimated by maximizing the mutual likelihood score between the speech samples belonging to the same speaker.
- Score: 15.652180150706002
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose self-supervised speaker representation learning
strategies, which comprise of a bootstrap equilibrium speaker representation
learning in the front-end and an uncertainty-aware probabilistic speaker
embedding training in the back-end. In the front-end stage, we learn the
speaker representations via the bootstrap training scheme with the uniformity
regularization term. In the back-end stage, the probabilistic speaker
embeddings are estimated by maximizing the mutual likelihood score between the
speech samples belonging to the same speaker, which provide not only speaker
representations but also data uncertainty. Experimental results show that the
proposed bootstrap equilibrium training strategy can effectively help learn the
speaker representations and outperforms the conventional methods based on
contrastive learning. Also, we demonstrate that the integrated two-stage
framework further improves the speaker verification performance on the
VoxCeleb1 test set in terms of EER and MinDCF.
Related papers
- Improving Speaker Diarization using Semantic Information: Joint Pairwise
Constraints Propagation [53.01238689626378]
We propose a novel approach to leverage semantic information in speaker diarization systems.
We introduce spoken language understanding modules to extract speaker-related semantic information.
We present a novel framework to integrate these constraints into the speaker diarization pipeline.
arXiv Detail & Related papers (2023-09-19T09:13:30Z) - A Reinforcement Learning Framework for Online Speaker Diarization [18.181920080789475]
Speaker diarization is a task to label an audio or video recording with the identity of the speaker at each given time stamp.
We propose a novel machine learning framework to conduct real-time multi-speaker diarization and recognition without prior registration and pretraining.
arXiv Detail & Related papers (2023-02-21T15:42:25Z) - Improved Relation Networks for End-to-End Speaker Verification and
Identification [0.0]
Speaker identification systems are tasked to identify a speaker amongst a set of enrolled speakers given just a few samples.
We propose improved relation networks for speaker verification and few-shot (unseen) speaker identification.
Inspired by the use of prototypical networks in speaker verification, we train the model to classify samples in the current episode amongst all speakers present in the training set.
arXiv Detail & Related papers (2022-03-31T17:44:04Z) - UniSpeech-SAT: Universal Speech Representation Learning with Speaker
Aware Pre-Training [72.004873454347]
Two methods are introduced for enhancing the unsupervised speaker information extraction.
Experiment results on SUPERB benchmark show that the proposed system achieves state-of-the-art performance.
We scale up training dataset to 94 thousand hours public audio data and achieve further performance improvement.
arXiv Detail & Related papers (2021-10-12T05:43:30Z) - Zero-Shot Personalized Speech Enhancement through Speaker-Informed Model
Selection [25.05285328404576]
optimizing speech towards a particular test-time speaker can improve performance and reduce run-time complexity.
We propose using an ensemble model wherein each specialist module denoises noisy utterances from a distinct partition of training set speakers.
Grouping the training set speakers into non-overlapping semantically similar groups is non-trivial and ill-defined.
arXiv Detail & Related papers (2021-05-08T00:15:57Z) - Self-supervised Text-independent Speaker Verification using Prototypical
Momentum Contrastive Learning [58.14807331265752]
We show that better speaker embeddings can be learned by momentum contrastive learning.
We generalize the self-supervised framework to a semi-supervised scenario where only a small portion of the data is labeled.
arXiv Detail & Related papers (2020-12-13T23:23:39Z) - Speaker Separation Using Speaker Inventories and Estimated Speech [78.57067876891253]
We propose speaker separation using speaker inventories (SSUSI) and speaker separation using estimated speech (SSUES)
By combining the advantages of permutation invariant training (PIT) and speech extraction, SSUSI significantly outperforms conventional approaches.
arXiv Detail & Related papers (2020-10-20T18:15:45Z) - Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis
Using Discrete Speech Representation [125.59372403631006]
We propose a semi-supervised learning approach for multi-speaker text-to-speech (TTS)
A multi-speaker TTS model can learn from the untranscribed audio via the proposed encoder-decoder framework with discrete speech representation.
We found the model can benefit from the proposed semi-supervised learning approach even when part of the unpaired speech data is noisy.
arXiv Detail & Related papers (2020-05-16T15:47:11Z) - Speaker Diarization with Lexical Information [59.983797884955]
This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition.
We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy.
arXiv Detail & Related papers (2020-04-13T17:16:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.