Data-Free Hard-Label Robustness Stealing Attack
- URL: http://arxiv.org/abs/2312.05924v2
- Date: Tue, 12 Dec 2023 10:21:12 GMT
- Title: Data-Free Hard-Label Robustness Stealing Attack
- Authors: Xiaojian Yuan, Kejiang Chen, Wen Huang, Jie Zhang, Weiming Zhang,
Nenghai Yu
- Abstract summary: We introduce a novel Data-Free Hard-Label Robustness Stealing (DFHL-RS) attack in this paper.
It enables the stealing of both model accuracy and robustness by simply querying hard labels of the target model.
Our method achieves a clean accuracy of 77.86% and a robust accuracy of 39.51% against AutoAttack.
- Score: 67.41281050467889
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The popularity of Machine Learning as a Service (MLaaS) has led to increased
concerns about Model Stealing Attacks (MSA), which aim to craft a clone model
by querying MLaaS. Currently, most research on MSA assumes that MLaaS can
provide soft labels and that the attacker has a proxy dataset with a similar
distribution. However, this fails to encapsulate the more practical scenario
where only hard labels are returned by MLaaS and the data distribution remains
elusive. Furthermore, most existing work focuses solely on stealing the model
accuracy, neglecting the model robustness, while robustness is essential in
security-sensitive scenarios, e.g., face-scan payment. Notably, improving model
robustness often necessitates the use of expensive techniques such as
adversarial training, thereby further making stealing robustness a more
lucrative prospect. In response to these identified gaps, we introduce a novel
Data-Free Hard-Label Robustness Stealing (DFHL-RS) attack in this paper, which
enables the stealing of both model accuracy and robustness by simply querying
hard labels of the target model without the help of any natural data.
Comprehensive experiments demonstrate the effectiveness of our method. The
clone model achieves a clean accuracy of 77.86% and a robust accuracy of 39.51%
against AutoAttack, which are only 4.71% and 8.40% lower than the target model
on the CIFAR-10 dataset, significantly exceeding the baselines. Our code is
available at: https://github.com/LetheSec/DFHL-RS-Attack.
Related papers
- Analysis of Zero Day Attack Detection Using MLP and XAI [0.0]
This paper analyzes Machine Learning (ML) and Deep Learning (DL) based approaches to create Intrusion Detection Systems (IDS)
The focus is on using the KDD99 dataset, which has the most research done among all the datasets for detecting zero-day attacks.
We evaluate the performance of four multilayer perceptron (MLP) trained on the KDD99 dataset, including baseline ML models, weighted ML models, truncated ML models, and weighted truncated ML models.
arXiv Detail & Related papers (2025-01-28T02:20:34Z) - Robustness of Selected Learning Models under Label-Flipping Attack [1.3812010983144798]
We compare traditional machine learning and deep learning models trained on a malware dataset when subjected to adversarial attack based on label-flipping.
We find wide variation in the robustness of the models tested to adversarial attack, with our model achieving the best combination of initial accuracy and robustness.
arXiv Detail & Related papers (2025-01-21T22:00:54Z) - Beyond Labeling Oracles: What does it mean to steal ML models? [52.63413852460003]
Model extraction attacks are designed to steal trained models with only query access.
We investigate factors influencing the success of model extraction attacks.
Our findings urge the community to redefine the adversarial goals of ME attacks.
arXiv Detail & Related papers (2023-10-03T11:10:21Z) - Unstoppable Attack: Label-Only Model Inversion via Conditional Diffusion
Model [14.834360664780709]
Model attacks (MIAs) aim to recover private data from inaccessible training sets of deep learning models.
This paper develops a novel MIA method, leveraging a conditional diffusion model (CDM) to recover representative samples under the target label.
Experimental results show that this method can generate similar and accurate samples to the target label, outperforming generators of previous approaches.
arXiv Detail & Related papers (2023-07-17T12:14:24Z) - Towards Data-Free Model Stealing in a Hard Label Setting [41.92884427579068]
We show that it is possible to steal Machine Learning models by accessing only top-1 predictions.
We propose a novel GAN-based framework that trains the student and generator in tandem to steal the model.
arXiv Detail & Related papers (2022-04-23T08:44:51Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Black-Box Dissector: Towards Erasing-based Hard-Label Model Stealing
Attack [90.6076825117532]
Model stealing attack aims to create a substitute model that steals the ability of the victim target model.
Most of the existing methods depend on the full probability outputs from the victim model, which is unavailable in most realistic scenarios.
We propose a novel hard-label model stealing method termed emphblack-box dissector, which includes a CAM-driven erasing strategy to mine the hidden information in hard labels from the victim model.
arXiv Detail & Related papers (2021-05-03T04:12:31Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z) - Learnable Boundary Guided Adversarial Training [66.57846365425598]
We use the model logits from one clean model to guide learning of another one robust model.
We achieve new state-of-the-art robustness on CIFAR-100 without additional real or synthetic data.
arXiv Detail & Related papers (2020-11-23T01:36:05Z) - Label Smoothing and Adversarial Robustness [16.804200102767208]
We find that training model with label smoothing can easily achieve striking accuracy under most gradient-based attacks.
Our study enlightens the research community to rethink how to evaluate the model's robustness appropriately.
arXiv Detail & Related papers (2020-09-17T12:36:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.