Model Extraction Attack against Self-supervised Speech Models
- URL: http://arxiv.org/abs/2211.16044v2
- Date: Sun, 8 Oct 2023 09:41:22 GMT
- Title: Model Extraction Attack against Self-supervised Speech Models
- Authors: Tsu-Yuan Hsu, Chen-An Li, Tung-Yu Wu, Hung-yi Lee
- Abstract summary: Self-supervised learning (SSL) speech models generate meaningful representations of given clips.
Model extraction attack (MEA) often refers to an adversary stealing the functionality of the victim model with only query access.
We study the MEA problem against SSL speech model with a small number of queries.
- Score: 52.81330435990717
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised learning (SSL) speech models generate meaningful
representations of given clips and achieve incredible performance across
various downstream tasks. Model extraction attack (MEA) often refers to an
adversary stealing the functionality of the victim model with only query
access. In this work, we study the MEA problem against SSL speech model with a
small number of queries. We propose a two-stage framework to extract the model.
In the first stage, SSL is conducted on the large-scale unlabeled corpus to
pre-train a small speech model. Secondly, we actively sample a small portion of
clips from the unlabeled corpus and query the target model with these clips to
acquire their representations as labels for the small model's second-stage
training. Experiment results show that our sampling methods can effectively
extract the target model without knowing any information about its model
architecture.
Related papers
- LLAVADI: What Matters For Multimodal Large Language Models Distillation [77.73964744238519]
In this work, we do not propose a new efficient model structure or train small-scale MLLMs from scratch.
Our studies involve training strategies, model choices, and distillation algorithms in the knowledge distillation process.
By evaluating different benchmarks and proper strategy, even a 2.7B small-scale model can perform on par with larger models with 7B or 13B parameters.
arXiv Detail & Related papers (2024-07-28T06:10:47Z) - ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets [106.7760874400261]
This paper presents ML-SUPERB2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models.
We find performance improvements over the setup of ML-SUPERB, but performance depends on the downstream model design.
Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches.
arXiv Detail & Related papers (2024-06-12T21:01:26Z) - MoPe: Model Perturbation-based Privacy Attacks on Language Models [4.4746931463927835]
Large Language Models (LLMs) can unintentionally leak sensitive information present in their training data.
We present Model Perturbations (MoPe), a new method to identify with high confidence if a given text is in the training data of a pre-trained language model.
arXiv Detail & Related papers (2023-10-22T17:33:19Z) - Mispronunciation detection using self-supervised speech representations [10.010024759851142]
We study the use of SSL models for the task of mispronunciation detection for second language learners.
We compare two downstream approaches: 1) training the model for phone recognition using native English data, and 2) training a model directly for the target task using non-native English data.
arXiv Detail & Related papers (2023-07-30T21:20:58Z) - Pushing the Limits of Unsupervised Unit Discovery for SSL Speech
Representation [12.506633315768832]
HuBERT is a successful example that utilizes offline clustering to convert speech features into discrete units for a masked language modeling pretext task.
We present an unsupervised method to improve SSL targets.
Two models are proposed, MonoBERT and PolyBERT, which leverage context-independent and context-dependent phoneme-based units for pre-training.
arXiv Detail & Related papers (2023-06-15T07:45:12Z) - LLM2Loss: Leveraging Language Models for Explainable Model Diagnostics [5.33024001730262]
We propose an approach that can provide semantic insights into a model's patterns of failures and biases.
We show that an ensemble of such lightweight models can be used to generate insights on the performance of the black-box model.
arXiv Detail & Related papers (2023-05-04T23:54:37Z) - Are You Stealing My Model? Sample Correlation for Fingerprinting Deep
Neural Networks [86.55317144826179]
Previous methods always leverage the transferable adversarial examples as the model fingerprint.
We propose a novel yet simple model stealing detection method based on SAmple Correlation (SAC)
SAC successfully defends against various model stealing attacks, even including adversarial training or transfer learning.
arXiv Detail & Related papers (2022-10-21T02:07:50Z) - On the Difficulty of Defending Self-Supervised Learning against Model
Extraction [23.497838165711983]
Self-Supervised Learning (SSL) is an increasingly popular ML paradigm that trains models to transform complex inputs into representations without relying on explicit labels.
This paper explores model stealing attacks against SSL.
We construct several novel attacks and find that approaches that train directly on a victim's stolen representations are query efficient and enable high accuracy for downstream models.
arXiv Detail & Related papers (2022-05-16T17:20:44Z) - Self-Supervised Learning for speech recognition with Intermediate layer
supervision [52.93758711230248]
We propose Intermediate Layer Supervision for Self-Supervised Learning (ILS-SSL)
ILS-SSL forces the model to concentrate on content information as much as possible by adding an additional SSL loss on the intermediate layers.
Experiments on LibriSpeech test-other set show that our method outperforms HuBERT significantly.
arXiv Detail & Related papers (2021-12-16T10:45:05Z) - Speech Summarization using Restricted Self-Attention [79.89680891246827]
We introduce a single model optimized end-to-end for speech summarization.
We demonstrate that the proposed model learns to directly summarize speech for the How-2 corpus of instructional videos.
arXiv Detail & Related papers (2021-10-12T18:21:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.