SLMIA-SR: Speaker-Level Membership Inference Attacks against Speaker
Recognition Systems
- URL: http://arxiv.org/abs/2309.07983v2
- Date: Mon, 27 Nov 2023 11:54:56 GMT
- Title: SLMIA-SR: Speaker-Level Membership Inference Attacks against Speaker
Recognition Systems
- Authors: Guangke Chen and Yedi Zhang and Fu Song
- Abstract summary: SLMIA-SR is the first membership inference attack tailored to speaker recognition (SR)
Our attack is versatile and can work in both white-box and black-box scenarios.
- Score: 6.057334150052503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Membership inference attacks allow adversaries to determine whether a
particular example was contained in the model's training dataset. While
previous works have confirmed the feasibility of such attacks in various
applications, none has focused on speaker recognition (SR), a promising
voice-based biometric recognition technique. In this work, we propose SLMIA-SR,
the first membership inference attack tailored to SR. In contrast to
conventional example-level attack, our attack features speaker-level membership
inference, i.e., determining if any voices of a given speaker, either the same
as or different from the given inference voices, have been involved in the
training of a model. It is particularly useful and practical since the training
and inference voices are usually distinct, and it is also meaningful
considering the open-set nature of SR, namely, the recognition speakers were
often not present in the training data. We utilize intra-similarity and
inter-dissimilarity, two training objectives of SR, to characterize the
differences between training and non-training speakers and quantify them with
two groups of features driven by carefully-established feature engineering to
mount the attack. To improve the generalizability of our attack, we propose a
novel mixing ratio training strategy to train attack models. To enhance the
attack performance, we introduce voice chunk splitting to cope with the limited
number of inference voices and propose to train attack models dependent on the
number of inference voices. Our attack is versatile and can work in both
white-box and black-box scenarios. Additionally, we propose two novel
techniques to reduce the number of black-box queries while maintaining the
attack performance. Extensive experiments demonstrate the effectiveness of
SLMIA-SR.
Related papers
- Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions [25.490988931354185]
We propose a novel two-stage framework for the problem by cascading target speaker extraction (TSE) method and speech emotion recognition (SER)
We first train a TSE model to extract the speech of target speaker from a mixture. Then, in the second stage, we utilize the extracted speech for SER training.
Our developed system achieves a 14.33% improvement in unweighted accuracy (UA) compared to a baseline without using TSE method.
arXiv Detail & Related papers (2024-09-29T07:04:50Z) - Some voices are too common: Building fair speech recognition systems
using the Common Voice dataset [2.28438857884398]
We use the French Common Voice dataset to quantify the biases of a pre-trained wav2vec2.0 model toward several demographic groups.
We also run an in-depth analysis of the Common Voice corpus and identify important shortcomings that should be taken into account.
arXiv Detail & Related papers (2023-06-01T11:42:34Z) - Interpretable Spectrum Transformation Attacks to Speaker Recognition [8.770780902627441]
A general framework is proposed to improve the transferability of adversarial voices to a black-box victim model.
The proposed framework operates voices in the time-frequency domain, which improves the interpretability, transferability, and imperceptibility of the attack.
arXiv Detail & Related papers (2023-02-21T14:12:29Z) - Adversarial Pretraining of Self-Supervised Deep Networks: Past, Present
and Future [132.34745793391303]
We review adversarial pretraining of self-supervised deep networks including both convolutional neural networks and vision transformers.
To incorporate adversaries into pretraining models on either input or feature level, we find that existing approaches are largely categorized into two groups.
arXiv Detail & Related papers (2022-10-23T13:14:06Z) - Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual
Active Speaker Detection [88.74863771919445]
We reveal the vulnerability of AVASD models under audio-only, visual-only, and audio-visual adversarial attacks.
We also propose a novel audio-visual interaction loss (AVIL) for making attackers difficult to find feasible adversarial examples.
arXiv Detail & Related papers (2022-10-03T08:10:12Z) - Sequence-level self-learning with multiple hypotheses [53.04725240411895]
We develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR)
In contrast to conventional unsupervised learning approaches, we adopt the emphmulti-task learning (MTL) framework.
Our experiment results show that our method can reduce the WER on the British speech data from 14.55% to 10.36% compared to the baseline model trained with the US English data only.
arXiv Detail & Related papers (2021-12-10T20:47:58Z) - Membership Inference Attacks Against Self-supervised Speech Models [62.73937175625953]
Self-supervised learning (SSL) on continuous speech has started gaining attention.
We present the first privacy analysis on several SSL speech models using Membership Inference Attacks (MIA) under black-box access.
arXiv Detail & Related papers (2021-11-09T13:00:24Z) - An Adversarially-Learned Turing Test for Dialog Generation Models [45.991035017908594]
We propose an adversarial training approach to learn a robust model, ATT, that discriminates machine-generated responses from human-written replies.
In contrast to previous perturbation-based methods, our discriminator is trained by iteratively generating unrestricted and diverse adversarial examples.
Our discriminator shows high accuracy on strong attackers including DialoGPT and GPT-3.
arXiv Detail & Related papers (2021-04-16T17:13:14Z) - Self-supervised Text-independent Speaker Verification using Prototypical
Momentum Contrastive Learning [58.14807331265752]
We show that better speaker embeddings can be learned by momentum contrastive learning.
We generalize the self-supervised framework to a semi-supervised scenario where only a small portion of the data is labeled.
arXiv Detail & Related papers (2020-12-13T23:23:39Z) - Sampling Attacks: Amplification of Membership Inference Attacks by
Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model.
We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance.
For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.