Stateful Detection of Model Extraction Attacks
- URL: http://arxiv.org/abs/2107.05166v1
- Date: Mon, 12 Jul 2021 02:18:26 GMT
- Title: Stateful Detection of Model Extraction Attacks
- Authors: Soham Pal, Yash Gupta, Aditya Kanade, Shirish Shevade
- Abstract summary: We propose VarDetect, a stateful monitor that tracks the distribution of queries made by users of a service to detect model extraction attacks.
VarDetect robustly separates three types of attacker samples from benign samples, and successfully raises an alarm for each.
We demonstrate that even adaptive attackers with prior knowledge of the deployment of VarDetect, are detected by it.
- Score: 9.405458160620535
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine-Learning-as-a-Service providers expose machine learning (ML) models
through application programming interfaces (APIs) to developers. Recent work
has shown that attackers can exploit these APIs to extract good approximations
of such ML models, by querying them with samples of their choosing. We propose
VarDetect, a stateful monitor that tracks the distribution of queries made by
users of such a service, to detect model extraction attacks. Harnessing the
latent distributions learned by a modified variational autoencoder, VarDetect
robustly separates three types of attacker samples from benign samples, and
successfully raises an alarm for each. Further, with VarDetect deployed as an
automated defense mechanism, the extracted substitute models are found to
exhibit poor performance and transferability, as intended. Finally, we
demonstrate that even adaptive attackers with prior knowledge of the deployment
of VarDetect, are detected by it.
Related papers
- PASA: Attack Agnostic Unsupervised Adversarial Detection using Prediction & Attribution Sensitivity Analysis [2.5347892611213614]
Deep neural networks for classification are vulnerable to adversarial attacks, where small perturbations to input samples lead to incorrect predictions.
We develop a practical method for this characteristic of model prediction and feature attribution to detect adversarial samples.
Our approach demonstrates competitive performance even when an adversary is aware of the defense mechanism.
arXiv Detail & Related papers (2024-04-12T21:22:21Z) - Unleashing Mask: Explore the Intrinsic Out-of-Distribution Detection
Capability [70.72426887518517]
Out-of-distribution (OOD) detection is an indispensable aspect of secure AI when deploying machine learning models in real-world applications.
We propose a novel method, Unleashing Mask, which aims to restore the OOD discriminative capabilities of the well-trained model with ID data.
Our method utilizes a mask to figure out the memorized atypical samples, and then finetune the model or prune it with the introduced mask to forget them.
arXiv Detail & Related papers (2023-06-06T14:23:34Z) - Masked Language Model Based Textual Adversarial Example Detection [14.734863175424797]
Adrial attacks are a serious threat to reliable deployment of machine learning models in safety-critical applications.
We propose a novel textual adversarial example detection method, namely Masked Model-based Detection (MLMD)
arXiv Detail & Related papers (2023-04-18T06:52:14Z) - EMShepherd: Detecting Adversarial Samples via Side-channel Leakage [6.868995628617191]
Adversarial attacks have disastrous consequences for deep learning-empowered critical applications.
We propose a framework, EMShepherd, to capture electromagnetic traces of model execution, perform processing on traces and exploit them for adversarial detection.
We demonstrate that our air-gapped EMShepherd can effectively detect different adversarial attacks on a commonly used FPGA deep learning accelerator.
arXiv Detail & Related papers (2023-03-27T19:38:55Z) - On the Difficulty of Defending Self-Supervised Learning against Model
Extraction [23.497838165711983]
Self-Supervised Learning (SSL) is an increasingly popular ML paradigm that trains models to transform complex inputs into representations without relying on explicit labels.
This paper explores model stealing attacks against SSL.
We construct several novel attacks and find that approaches that train directly on a victim's stolen representations are query efficient and enable high accuracy for downstream models.
arXiv Detail & Related papers (2022-05-16T17:20:44Z) - DAE : Discriminatory Auto-Encoder for multivariate time-series anomaly
detection in air transportation [68.8204255655161]
We propose a novel anomaly detection model called Discriminatory Auto-Encoder (DAE)
It uses the baseline of a regular LSTM-based auto-encoder but with several decoders, each getting data of a specific flight phase.
Results show that the DAE achieves better results in both accuracy and speed of detection.
arXiv Detail & Related papers (2021-09-08T14:07:55Z) - Are Pretrained Transformers Robust in Intent Classification? A Missing
Ingredient in Evaluation of Out-of-Scope Intent Detection [93.40525251094071]
We first point out the importance of in-domain out-of-scope detection in few-shot intent recognition tasks.
We then illustrate the vulnerability of pretrained Transformer-based models against samples that are in-domain but out-of-scope (ID-OOS)
arXiv Detail & Related papers (2021-06-08T17:51:12Z) - Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle.
In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.
Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z) - Unsupervised Anomaly Detection with Adversarial Mirrored AutoEncoders [51.691585766702744]
We propose a variant of Adversarial Autoencoder which uses a mirrored Wasserstein loss in the discriminator to enforce better semantic-level reconstruction.
We put forward an alternative measure of anomaly score to replace the reconstruction-based metric.
Our method outperforms the current state-of-the-art methods for anomaly detection on several OOD detection benchmarks.
arXiv Detail & Related papers (2020-03-24T08:26:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.