On the Difficulty of Defending Self-Supervised Learning against Model
Extraction
- URL: http://arxiv.org/abs/2205.07890v1
- Date: Mon, 16 May 2022 17:20:44 GMT
- Title: On the Difficulty of Defending Self-Supervised Learning against Model
Extraction
- Authors: Adam Dziedzic, Nikita Dhawan, Muhammad Ahmad Kaleem, Jonas Guan,
Nicolas Papernot
- Abstract summary: Self-Supervised Learning (SSL) is an increasingly popular ML paradigm that trains models to transform complex inputs into representations without relying on explicit labels.
This paper explores model stealing attacks against SSL.
We construct several novel attacks and find that approaches that train directly on a victim's stolen representations are query efficient and enable high accuracy for downstream models.
- Score: 23.497838165711983
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-Supervised Learning (SSL) is an increasingly popular ML paradigm that
trains models to transform complex inputs into representations without relying
on explicit labels. These representations encode similarity structures that
enable efficient learning of multiple downstream tasks. Recently,
ML-as-a-Service providers have commenced offering trained SSL models over
inference APIs, which transform user inputs into useful representations for a
fee. However, the high cost involved to train these models and their exposure
over APIs both make black-box extraction a realistic security threat. We thus
explore model stealing attacks against SSL. Unlike traditional model extraction
on classifiers that output labels, the victim models here output
representations; these representations are of significantly higher
dimensionality compared to the low-dimensional prediction scores output by
classifiers. We construct several novel attacks and find that approaches that
train directly on a victim's stolen representations are query efficient and
enable high accuracy for downstream models. We then show that existing defenses
against model extraction are inadequate and not easily retrofitted to the
specificities of SSL.
Related papers
- Advancing the Robustness of Large Language Models through Self-Denoised Smoothing [50.54276872204319]
Large language models (LLMs) have achieved significant success, but their vulnerability to adversarial perturbations has raised considerable concerns.
We propose to leverage the multitasking nature of LLMs to first denoise the noisy inputs and then to make predictions based on these denoised versions.
Unlike previous denoised smoothing techniques in computer vision, which require training a separate model to enhance the robustness of LLMs, our method offers significantly better efficiency and flexibility.
arXiv Detail & Related papers (2024-04-18T15:47:00Z) - MisGUIDE : Defense Against Data-Free Deep Learning Model Extraction [0.8437187555622164]
"MisGUIDE" is a two-step defense framework for Deep Learning models that disrupts the adversarial sample generation process.
The aim of the proposed defense method is to reduce the accuracy of the cloned model while maintaining accuracy on authentic queries.
arXiv Detail & Related papers (2024-03-27T13:59:21Z) - Beyond Labeling Oracles: What does it mean to steal ML models? [52.63413852460003]
Model extraction attacks are designed to steal trained models with only query access.
We investigate factors influencing the success of model extraction attacks.
Our findings urge the community to redefine the adversarial goals of ME attacks.
arXiv Detail & Related papers (2023-10-03T11:10:21Z) - Progressive Feature Adjustment for Semi-supervised Learning from
Pretrained Models [39.42802115580677]
Semi-supervised learning (SSL) can leverage both labeled and unlabeled data to build a predictive model.
Recent literature suggests that naively applying state-of-the-art SSL with a pretrained model fails to unleash the full potential of training data.
We propose to use pseudo-labels from the unlabelled data to update the feature extractor that is less sensitive to incorrect labels.
arXiv Detail & Related papers (2023-09-09T01:57:14Z) - Model Extraction Attack against Self-supervised Speech Models [52.81330435990717]
Self-supervised learning (SSL) speech models generate meaningful representations of given clips.
Model extraction attack (MEA) often refers to an adversary stealing the functionality of the victim model with only query access.
We study the MEA problem against SSL speech model with a small number of queries.
arXiv Detail & Related papers (2022-11-29T09:28:05Z) - Are You Stealing My Model? Sample Correlation for Fingerprinting Deep
Neural Networks [86.55317144826179]
Previous methods always leverage the transferable adversarial examples as the model fingerprint.
We propose a novel yet simple model stealing detection method based on SAmple Correlation (SAC)
SAC successfully defends against various model stealing attacks, even including adversarial training or transfer learning.
arXiv Detail & Related papers (2022-10-21T02:07:50Z) - Dataset Inference for Self-Supervised Models [21.119579812529395]
Self-supervised models are increasingly prevalent in machine learning (ML)
They are vulnerable to model stealing attacks due to the high dimensionality of vector representations they output.
We introduce a new dataset inference defense, which uses the private training set of the victim encoder model to attribute its ownership in the event of stealing.
arXiv Detail & Related papers (2022-09-16T15:39:06Z) - Defending Variational Autoencoders from Adversarial Attacks with MCMC [74.36233246536459]
Variational autoencoders (VAEs) are deep generative models used in various domains.
As previous work has shown, one can easily fool VAEs to produce unexpected latent representations and reconstructions for a visually slightly modified input.
Here, we examine several objective functions for adversarial attacks construction, suggest metrics assess the model robustness, and propose a solution.
arXiv Detail & Related papers (2022-03-18T13:25:18Z) - Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution.
This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes.
Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.