Related papers: On the Difficulty of Defending Self-Supervised Learning against Model Extraction

On the Difficulty of Defending Self-Supervised Learning against Model Extraction

URL: http://arxiv.org/abs/2205.07890v1
Date: Mon, 16 May 2022 17:20:44 GMT
Title: On the Difficulty of Defending Self-Supervised Learning against Model Extraction
Authors: Adam Dziedzic, Nikita Dhawan, Muhammad Ahmad Kaleem, Jonas Guan, Nicolas Papernot
Abstract summary: Self-Supervised Learning (SSL) is an increasingly popular ML paradigm that trains models to transform complex inputs into representations without relying on explicit labels. This paper explores model stealing attacks against SSL. We construct several novel attacks and find that approaches that train directly on a victim's stolen representations are query efficient and enable high accuracy for downstream models.
Score: 23.497838165711983
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Self-Supervised Learning (SSL) is an increasingly popular ML paradigm that trains models to transform complex inputs into representations without relying on explicit labels. These representations encode similarity structures that enable efficient learning of multiple downstream tasks. Recently, ML-as-a-Service providers have commenced offering trained SSL models over inference APIs, which transform user inputs into useful representations for a fee. However, the high cost involved to train these models and their exposure over APIs both make black-box extraction a realistic security threat. We thus explore model stealing attacks against SSL. Unlike traditional model extraction on classifiers that output labels, the victim models here output representations; these representations are of significantly higher dimensionality compared to the low-dimensional prediction scores output by classifiers. We construct several novel attacks and find that approaches that train directly on a victim's stolen representations are query efficient and enable high accuracy for downstream models. We then show that existing defenses against model extraction are inadequate and not easily retrofitted to the specificities of SSL.

Related papers

Predicting the Performance of Black-box LLMs through Self-Queries [60.87193950962585]
Large language models (LLMs) are increasingly relied on in AI systems, predicting when they make mistakes is crucial. In this paper, we extract features of LLMs in a black-box manner by using follow-up prompts and taking the probabilities of different responses as representations. We demonstrate that training a linear model on these low-dimensional representations produces reliable predictors of model performance at the instance level.
arXiv Detail & Related papers (2025-01-02T22:26:54Z)
Defending Against Neural Network Model Inversion Attacks via Data Poisoning [15.099559883494475]
Model inversion attacks pose a significant privacy threat to machine learning models. This paper introduces a novel defense mechanism to better balance privacy and utility. We propose a strategy that leverages data poisoning to contaminate the training data of inversion models.
arXiv Detail & Related papers (2024-12-10T15:08:56Z)
Advancing the Robustness of Large Language Models through Self-Denoised Smoothing [50.54276872204319]
Large language models (LLMs) have achieved significant success, but their vulnerability to adversarial perturbations has raised considerable concerns. We propose to leverage the multitasking nature of LLMs to first denoise the noisy inputs and then to make predictions based on these denoised versions. Unlike previous denoised smoothing techniques in computer vision, which require training a separate model to enhance the robustness of LLMs, our method offers significantly better efficiency and flexibility.
arXiv Detail & Related papers (2024-04-18T15:47:00Z)
MisGUIDE : Defense Against Data-Free Deep Learning Model Extraction [0.8437187555622164]
"MisGUIDE" is a two-step defense framework for Deep Learning models that disrupts the adversarial sample generation process. The aim of the proposed defense method is to reduce the accuracy of the cloned model while maintaining accuracy on authentic queries.
arXiv Detail & Related papers (2024-03-27T13:59:21Z)
Beyond Labeling Oracles: What does it mean to steal ML models? [52.63413852460003]
Model extraction attacks are designed to steal trained models with only query access. We investigate factors influencing the success of model extraction attacks. Our findings urge the community to redefine the adversarial goals of ME attacks.
arXiv Detail & Related papers (2023-10-03T11:10:21Z)
Progressive Feature Adjustment for Semi-supervised Learning from Pretrained Models [39.42802115580677]
Semi-supervised learning (SSL) can leverage both labeled and unlabeled data to build a predictive model. Recent literature suggests that naively applying state-of-the-art SSL with a pretrained model fails to unleash the full potential of training data. We propose to use pseudo-labels from the unlabelled data to update the feature extractor that is less sensitive to incorrect labels.
arXiv Detail & Related papers (2023-09-09T01:57:14Z)
Model Extraction Attack against Self-supervised Speech Models [52.81330435990717]
Self-supervised learning (SSL) speech models generate meaningful representations of given clips. Model extraction attack (MEA) often refers to an adversary stealing the functionality of the victim model with only query access. We study the MEA problem against SSL speech model with a small number of queries.
arXiv Detail & Related papers (2022-11-29T09:28:05Z)
Are You Stealing My Model? Sample Correlation for Fingerprinting Deep Neural Networks [86.55317144826179]
Previous methods always leverage the transferable adversarial examples as the model fingerprint. We propose a novel yet simple model stealing detection method based on SAmple Correlation (SAC) SAC successfully defends against various model stealing attacks, even including adversarial training or transfer learning.
arXiv Detail & Related papers (2022-10-21T02:07:50Z)
Dataset Inference for Self-Supervised Models [21.119579812529395]
Self-supervised models are increasingly prevalent in machine learning (ML) They are vulnerable to model stealing attacks due to the high dimensionality of vector representations they output. We introduce a new dataset inference defense, which uses the private training set of the victim encoder model to attribute its ownership in the event of stealing.
arXiv Detail & Related papers (2022-09-16T15:39:06Z)
Defending Variational Autoencoders from Adversarial Attacks with MCMC [74.36233246536459]
Variational autoencoders (VAEs) are deep generative models used in various domains. As previous work has shown, one can easily fool VAEs to produce unexpected latent representations and reconstructions for a visually slightly modified input. Here, we examine several objective functions for adversarial attacks construction, suggest metrics assess the model robustness, and propose a solution.
arXiv Detail & Related papers (2022-03-18T13:25:18Z)
Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution. This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes. Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.