Membership Inference Attacks Against Self-supervised Speech Models
- URL: http://arxiv.org/abs/2111.05113v1
- Date: Tue, 9 Nov 2021 13:00:24 GMT
- Title: Membership Inference Attacks Against Self-supervised Speech Models
- Authors: Wei-Cheng Tseng, Wei-Tsung Kao, Hung-yi Lee
- Abstract summary: Self-supervised learning (SSL) on continuous speech has started gaining attention.
We present the first privacy analysis on several SSL speech models using Membership Inference Attacks (MIA) under black-box access.
- Score: 62.73937175625953
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recently, adapting the idea of self-supervised learning (SSL) on continuous
speech has started gaining attention. SSL models pre-trained on a huge amount
of unlabeled audio can generate general-purpose representations that benefit a
wide variety of speech processing tasks. Despite their ubiquitous deployment,
however, the potential privacy risks of these models have not been well
investigated. In this paper, we present the first privacy analysis on several
SSL speech models using Membership Inference Attacks (MIA) under black-box
access. The experiment results show that these pre-trained models are
vulnerable to MIA and prone to membership information leakage with high
adversarial advantage scores in both utterance-level and speaker-level.
Furthermore, we also conduct several ablation studies to understand the factors
that contribute to the success of MIA.
Related papers
- What Do Self-Supervised Speech and Speaker Models Learn? New Findings
From a Cross Model Layer-Wise Analysis [44.93152068353389]
Self-supervised learning (SSL) has attracted increased attention for learning meaningful speech representations.
Speaker SSL models adopt utterance-level training objectives primarily for speaker representation.
arXiv Detail & Related papers (2024-01-31T07:23:22Z) - Self-supervised Neural Factor Analysis for Disentangling Utterance-level
Speech Representations [30.293081541301746]
Self-supervised learning (SSL) speech models such as wav2vec and HuBERT have demonstrated state-of-the-art performance on automatic speech recognition.
We argue that the problem is caused by the lack of disentangled representations and an utterance-level learning objective.
Our models outperform the current best model, WavLM, on all utterance-level non-semantic tasks on the SUPERB benchmark with only 20% of labeled data.
arXiv Detail & Related papers (2023-05-14T08:26:24Z) - Why does Self-Supervised Learning for Speech Recognition Benefit Speaker
Recognition? [86.53044183309824]
We study which factor leads to the success of self-supervised learning on speaker-related tasks.
Our empirical results on the Voxceleb-1 dataset suggest that the benefit of SSL to SV task is from a combination of mask speech prediction loss, data scale, and model size.
arXiv Detail & Related papers (2022-04-27T08:35:57Z) - Self-Supervised Learning for speech recognition with Intermediate layer
supervision [52.93758711230248]
We propose Intermediate Layer Supervision for Self-Supervised Learning (ILS-SSL)
ILS-SSL forces the model to concentrate on content information as much as possible by adding an additional SSL loss on the intermediate layers.
Experiments on LibriSpeech test-other set show that our method outperforms HuBERT significantly.
arXiv Detail & Related papers (2021-12-16T10:45:05Z) - Characterizing the adversarial vulnerability of speech self-supervised
learning [95.03389072594243]
We make the first attempt to investigate the adversarial vulnerability of such paradigm under the attacks from both zero-knowledge adversaries and limited-knowledge adversaries.
The experimental results illustrate that the paradigm proposed by SUPERB is seriously vulnerable to limited-knowledge adversaries.
arXiv Detail & Related papers (2021-11-08T08:44:04Z) - UniSpeech-SAT: Universal Speech Representation Learning with Speaker
Aware Pre-Training [72.004873454347]
Two methods are introduced for enhancing the unsupervised speaker information extraction.
Experiment results on SUPERB benchmark show that the proposed system achieves state-of-the-art performance.
We scale up training dataset to 94 thousand hours public audio data and achieve further performance improvement.
arXiv Detail & Related papers (2021-10-12T05:43:30Z) - Self-supervised Text-independent Speaker Verification using Prototypical
Momentum Contrastive Learning [58.14807331265752]
We show that better speaker embeddings can be learned by momentum contrastive learning.
We generalize the self-supervised framework to a semi-supervised scenario where only a small portion of the data is labeled.
arXiv Detail & Related papers (2020-12-13T23:23:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.