Related papers: Stateful Detection of Model Extraction Attacks

Stateful Detection of Model Extraction Attacks

URL: http://arxiv.org/abs/2107.05166v1
Date: Mon, 12 Jul 2021 02:18:26 GMT
Title: Stateful Detection of Model Extraction Attacks
Authors: Soham Pal, Yash Gupta, Aditya Kanade, Shirish Shevade
Abstract summary: We propose VarDetect, a stateful monitor that tracks the distribution of queries made by users of a service to detect model extraction attacks. VarDetect robustly separates three types of attacker samples from benign samples, and successfully raises an alarm for each. We demonstrate that even adaptive attackers with prior knowledge of the deployment of VarDetect, are detected by it.
Score: 9.405458160620535
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine-Learning-as-a-Service providers expose machine learning (ML) models through application programming interfaces (APIs) to developers. Recent work has shown that attackers can exploit these APIs to extract good approximations of such ML models, by querying them with samples of their choosing. We propose VarDetect, a stateful monitor that tracks the distribution of queries made by users of such a service, to detect model extraction attacks. Harnessing the latent distributions learned by a modified variational autoencoder, VarDetect robustly separates three types of attacker samples from benign samples, and successfully raises an alarm for each. Further, with VarDetect deployed as an automated defense mechanism, the extracted substitute models are found to exhibit poor performance and transferability, as intended. Finally, we demonstrate that even adaptive attackers with prior knowledge of the deployment of VarDetect, are detected by it.

Related papers

Retrieval Augmented Anomaly Detection (RAAD): Nimble Model Adjustment Without Retraining [3.037546128667634]
We introduce Retrieval Augmented Anomaly Detection, a novel method taking inspiration from Retrieval Augmented Generation. Human annotated examples are sent to a vector store, which can modify model outputs on the very next processed batch for model inference.
arXiv Detail & Related papers (2025-02-26T20:17:16Z)
Predicting the Performance of Black-box LLMs through Self-Queries [60.87193950962585]
Large language models (LLMs) are increasingly relied on in AI systems, predicting when they make mistakes is crucial. In this paper, we extract features of LLMs in a black-box manner by using follow-up prompts and taking the probabilities of different responses as representations. We demonstrate that training a linear model on these low-dimensional representations produces reliable predictors of model performance at the instance level.
arXiv Detail & Related papers (2025-01-02T22:26:54Z)
PASA: Attack Agnostic Unsupervised Adversarial Detection using Prediction & Attribution Sensitivity Analysis [2.5347892611213614]
Deep neural networks for classification are vulnerable to adversarial attacks, where small perturbations to input samples lead to incorrect predictions. We develop a practical method for this characteristic of model prediction and feature attribution to detect adversarial samples. Our approach demonstrates competitive performance even when an adversary is aware of the defense mechanism.
arXiv Detail & Related papers (2024-04-12T21:22:21Z)
Unleashing Mask: Explore the Intrinsic Out-of-Distribution Detection Capability [70.72426887518517]
Out-of-distribution (OOD) detection is an indispensable aspect of secure AI when deploying machine learning models in real-world applications. We propose a novel method, Unleashing Mask, which aims to restore the OOD discriminative capabilities of the well-trained model with ID data. Our method utilizes a mask to figure out the memorized atypical samples, and then finetune the model or prune it with the introduced mask to forget them.
arXiv Detail & Related papers (2023-06-06T14:23:34Z)
Masked Language Model Based Textual Adversarial Example Detection [14.734863175424797]
Adrial attacks are a serious threat to reliable deployment of machine learning models in safety-critical applications. We propose a novel textual adversarial example detection method, namely Masked Model-based Detection (MLMD)
arXiv Detail & Related papers (2023-04-18T06:52:14Z)
EMShepherd: Detecting Adversarial Samples via Side-channel Leakage [6.868995628617191]
Adversarial attacks have disastrous consequences for deep learning-empowered critical applications. We propose a framework, EMShepherd, to capture electromagnetic traces of model execution, perform processing on traces and exploit them for adversarial detection. We demonstrate that our air-gapped EMShepherd can effectively detect different adversarial attacks on a commonly used FPGA deep learning accelerator.
arXiv Detail & Related papers (2023-03-27T19:38:55Z)
On the Difficulty of Defending Self-Supervised Learning against Model Extraction [23.497838165711983]
Self-Supervised Learning (SSL) is an increasingly popular ML paradigm that trains models to transform complex inputs into representations without relying on explicit labels. This paper explores model stealing attacks against SSL. We construct several novel attacks and find that approaches that train directly on a victim's stolen representations are query efficient and enable high accuracy for downstream models.
arXiv Detail & Related papers (2022-05-16T17:20:44Z)
DAE : Discriminatory Auto-Encoder for multivariate time-series anomaly detection in air transportation [68.8204255655161]
We propose a novel anomaly detection model called Discriminatory Auto-Encoder (DAE) It uses the baseline of a regular LSTM-based auto-encoder but with several decoders, each getting data of a specific flight phase. Results show that the DAE achieves better results in both accuracy and speed of detection.
arXiv Detail & Related papers (2021-09-08T14:07:55Z)
Are Pretrained Transformers Robust in Intent Classification? A Missing Ingredient in Evaluation of Out-of-Scope Intent Detection [93.40525251094071]
We first point out the importance of in-domain out-of-scope detection in few-shot intent recognition tasks. We then illustrate the vulnerability of pretrained Transformer-based models against samples that are in-domain but out-of-scope (ID-OOS)
arXiv Detail & Related papers (2021-06-08T17:51:12Z)
Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle. In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize. Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z)
Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch. We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types. In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
Unsupervised Anomaly Detection with Adversarial Mirrored AutoEncoders [51.691585766702744]
We propose a variant of Adversarial Autoencoder which uses a mirrored Wasserstein loss in the discriminator to enforce better semantic-level reconstruction. We put forward an alternative measure of anomaly score to replace the reconstruction-based metric. Our method outperforms the current state-of-the-art methods for anomaly detection on several OOD detection benchmarks.
arXiv Detail & Related papers (2020-03-24T08:26:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.