Privacy attacks for automatic speech recognition acoustic models in a
federated learning framework
- URL: http://arxiv.org/abs/2111.03777v1
- Date: Sat, 6 Nov 2021 02:08:13 GMT
- Title: Privacy attacks for automatic speech recognition acoustic models in a
federated learning framework
- Authors: Natalia Tomashenko, Salima Mdhaffar, Marc Tommasi, Yannick Est\`eve,
Jean-Fran\c{c}ois Bonastre
- Abstract summary: We propose an approach to analyze information in neural network AMs based on a neural network footprint on the Indicator dataset.
Experiments on the TED-LIUM 3 corpus demonstrate that the proposed approaches are very effective and can provide equal error rate (EER) of 1-2%.
- Score: 5.1229352884025845
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates methods to effectively retrieve speaker information
from the personalized speaker adapted neural network acoustic models (AMs) in
automatic speech recognition (ASR). This problem is especially important in the
context of federated learning of ASR acoustic models where a global model is
learnt on the server based on the updates received from multiple clients. We
propose an approach to analyze information in neural network AMs based on a
neural network footprint on the so-called Indicator dataset. Using this method,
we develop two attack models that aim to infer speaker identity from the
updated personalized models without access to the actual users' speech data.
Experiments on the TED-LIUM 3 corpus demonstrate that the proposed approaches
are very effective and can provide equal error rate (EER) of 1-2%.
Related papers
- AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot
AV-ASR [79.21857972093332]
We present AVFormer, a method for augmenting audio-only models with visual information, at the same time performing lightweight domain adaptation.
We show that these can be trained on a small amount of weakly labelled video data with minimum additional training time and parameters.
We also introduce a simple curriculum scheme during training which we show is crucial to enable the model to jointly process audio and visual information effectively.
arXiv Detail & Related papers (2023-03-29T07:24:28Z) - Federated Learning for ASR based on Wav2vec 2.0 [4.711492191554342]
We study the use of federated learning to train an ASR model based on a wav2vec 2.0 model pre-trained by self supervision.
Experiments show that such a model can obtain, with no use of a language model, a word error rate of 10.92% on the official TED-LIUM 3 test set.
We also analyse the ASR performance for speakers depending on their participation to the federated learning.
arXiv Detail & Related papers (2023-02-20T18:36:46Z) - Fully Automated End-to-End Fake Audio Detection [57.78459588263812]
This paper proposes a fully automated end-toend fake audio detection method.
We first use wav2vec pre-trained model to obtain a high-level representation of the speech.
For the network structure, we use a modified version of the differentiable architecture search (DARTS) named light-DARTS.
arXiv Detail & Related papers (2022-08-20T06:46:55Z) - Retrieving Speaker Information from Personalized Acoustic Models for
Speech Recognition [5.1229352884025845]
We show that it is possible to retrieve the gender of the speaker, but also his identity, by just exploiting the weight matrix changes of a neural acoustic model locally adapted to this speaker.
In this paper, we show that it is possible to retrieve the gender of the speaker, but also his identity, by just exploiting the weight matrix changes of a neural acoustic model locally adapted to this speaker.
arXiv Detail & Related papers (2021-11-07T22:17:52Z) - LDNet: Unified Listener Dependent Modeling in MOS Prediction for
Synthetic Speech [67.88748572167309]
We present LDNet, a unified framework for mean opinion score (MOS) prediction.
We propose two inference methods that provide more stable results and efficient computation.
arXiv Detail & Related papers (2021-10-18T08:52:31Z) - Private Language Model Adaptation for Speech Recognition [15.726921748859393]
Speech model adaptation is crucial to handle the discrepancy between server-side proxy training data and actual data received on users' local devices.
We introduce an efficient approach on continuously adapting neural network language models (NNLMs) on private devices with applications on automatic speech recognition.
arXiv Detail & Related papers (2021-09-28T00:15:43Z) - End-to-End Diarization for Variable Number of Speakers with Local-Global
Networks and Discriminative Speaker Embeddings [66.50782702086575]
We present an end-to-end deep network model that performs meeting diarization from single-channel audio recordings.
The proposed system is designed to handle meetings with unknown numbers of speakers, using variable-number permutation-invariant cross-entropy based loss functions.
arXiv Detail & Related papers (2021-05-05T14:55:29Z) - AutoSpeech: Neural Architecture Search for Speaker Recognition [108.69505815793028]
We propose the first neural architecture search approach approach for the speaker recognition tasks, named as AutoSpeech.
Our algorithm first identifies the optimal operation combination in a neural cell and then derives a CNN model by stacking the neural cell for multiple times.
Results demonstrate that the derived CNN architectures significantly outperform current speaker recognition systems based on VGG-M, ResNet-18, and ResNet-34 back-bones, while enjoying lower model complexity.
arXiv Detail & Related papers (2020-05-07T02:53:47Z) - Towards Relevance and Sequence Modeling in Language Recognition [39.547398348702025]
We propose a neural network framework utilizing short-sequence information in language recognition.
A new model is proposed for incorporating relevance in language recognition, where parts of speech data are weighted more based on their relevance for the language recognition task.
Experiments are performed using the language recognition task in NIST LRE 2017 Challenge using clean, noisy and multi-speaker speech data.
arXiv Detail & Related papers (2020-04-02T18:31:18Z) - Characterizing Speech Adversarial Examples Using Self-Attention U-Net
Enhancement [102.48582597586233]
We present a U-Net based attention model, U-Net$_At$, to enhance adversarial speech signals.
We conduct experiments on the automatic speech recognition (ASR) task with adversarial audio attacks.
arXiv Detail & Related papers (2020-03-31T02:16:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.