Improving on-device speaker verification using federated learning with
privacy
- URL: http://arxiv.org/abs/2008.02651v1
- Date: Thu, 6 Aug 2020 13:37:14 GMT
- Title: Improving on-device speaker verification using federated learning with
privacy
- Authors: Filip Granqvist, Matt Seigel, Rogier van Dalen, \'Aine Cahill, Stephen
Shum, Matthias Paulik
- Abstract summary: Information on speaker characteristics can be useful as side information in improving speaker recognition accuracy.
This paper investigates how privacy-preserving learning can improve a speaker verification system.
- Score: 5.321241042620525
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Information on speaker characteristics can be useful as side information in
improving speaker recognition accuracy. However, such information is often
private. This paper investigates how privacy-preserving learning can improve a
speaker verification system, by enabling the use of privacy-sensitive speaker
data to train an auxiliary classification model that predicts vocal
characteristics of speakers. In particular, this paper explores the utility
achieved by approaches which combine different federated learning and
differential privacy mechanisms. These approaches make it possible to train a
central model while protecting user privacy, with users' data remaining on
their devices. Furthermore, they make learning on a large population of
speakers possible, ensuring good coverage of speaker characteristics when
training a model. The auxiliary model described here uses features extracted
from phrases which trigger a speaker verification system. From these features,
the model predicts speaker characteristic labels considered useful as side
information. The knowledge of the auxiliary model is distilled into a speaker
verification system using multi-task learning, with the side information labels
predicted by this auxiliary model being the additional task. This approach
results in a 6% relative improvement in equal error rate over a baseline
system.
Related papers
- Improving Speaker Diarization using Semantic Information: Joint Pairwise
Constraints Propagation [53.01238689626378]
We propose a novel approach to leverage semantic information in speaker diarization systems.
We introduce spoken language understanding modules to extract speaker-related semantic information.
We present a novel framework to integrate these constraints into the speaker diarization pipeline.
arXiv Detail & Related papers (2023-09-19T09:13:30Z) - Self-supervised Fine-tuning for Improved Content Representations by
Speaker-invariant Clustering [78.2927924732142]
We propose speaker-invariant clustering (Spin) as a novel self-supervised learning method.
Spin disentangles speaker information and preserves content representations with just 45 minutes of fine-tuning on a single GPU.
arXiv Detail & Related papers (2023-05-18T15:59:36Z) - Self supervised learning for robust voice cloning [3.7989740031754806]
We use features learned in a self-supervised framework to produce high quality speech representations.
The learned features are used as pre-trained utterance-level embeddings and as inputs to a Non-Attentive Tacotron based architecture.
This method enables us to train our model in an unlabeled multispeaker dataset as well as use unseen speaker embeddings to copy a speaker's voice.
arXiv Detail & Related papers (2022-04-07T13:05:24Z) - Improving speaker de-identification with functional data analysis of f0
trajectories [10.809893662563926]
Formant modification is a simpler, yet effective method for speaker de-identification which requires no training data.
This study introduces a novel speaker de-identification method, which, in addition to simple formant shifts, manipulates f0 trajectories based on functional data analysis.
The proposed speaker de-identification method will conceal plausibly identifying pitch characteristics in a phonetically controllable manner and improve formant-based speaker de-identification up to 25%.
arXiv Detail & Related papers (2022-03-31T01:34:15Z) - Self-supervised Speaker Recognition Training Using Human-Machine
Dialogues [22.262550043863445]
We investigate how to pretrain speaker recognition models by leveraging dialogues between customers and smart-speaker devices.
We propose an effective rejection mechanism that selectively learns from dialogues based on their acoustic homogeneity.
Experiments demonstrate that the proposed method provides significant performance improvements, superior to earlier work.
arXiv Detail & Related papers (2022-02-07T19:44:54Z) - Retrieving Speaker Information from Personalized Acoustic Models for
Speech Recognition [5.1229352884025845]
We show that it is possible to retrieve the gender of the speaker, but also his identity, by just exploiting the weight matrix changes of a neural acoustic model locally adapted to this speaker.
In this paper, we show that it is possible to retrieve the gender of the speaker, but also his identity, by just exploiting the weight matrix changes of a neural acoustic model locally adapted to this speaker.
arXiv Detail & Related papers (2021-11-07T22:17:52Z) - Speaker De-identification System using Autoencoders and Adversarial
Training [58.720142291102135]
We propose a speaker de-identification system based on adversarial training and autoencoders.
Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
arXiv Detail & Related papers (2020-11-09T19:22:05Z) - Leveraging speaker attribute information using multi task learning for
speaker verification and diarization [33.60058873783114]
We propose a framework for making use of auxiliary label information, even when it is only available for speech corpora mismatched to the target application.
We show that by leveraging two additional forms of speaker attribute information, we improve the performance of our deep speaker embeddings for both verification and diarization tasks.
arXiv Detail & Related papers (2020-10-27T13:10:51Z) - Augmentation adversarial training for self-supervised speaker
recognition [49.47756927090593]
We train robust speaker recognition models without speaker labels.
Experiments on VoxCeleb and VOiCES datasets show significant improvements over previous works using self-supervision.
arXiv Detail & Related papers (2020-07-23T15:49:52Z) - Speaker Diarization with Lexical Information [59.983797884955]
This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition.
We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy.
arXiv Detail & Related papers (2020-04-13T17:16:56Z) - Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention [70.82604384963679]
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features.
We extract a speaker representation used for adaptation directly from the test utterance.
arXiv Detail & Related papers (2020-02-14T05:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.