Data-Free Hard-Label Robustness Stealing Attack
- URL: http://arxiv.org/abs/2312.05924v2
- Date: Tue, 12 Dec 2023 10:21:12 GMT
- Title: Data-Free Hard-Label Robustness Stealing Attack
- Authors: Xiaojian Yuan, Kejiang Chen, Wen Huang, Jie Zhang, Weiming Zhang,
Nenghai Yu
- Abstract summary: We introduce a novel Data-Free Hard-Label Robustness Stealing (DFHL-RS) attack in this paper.
It enables the stealing of both model accuracy and robustness by simply querying hard labels of the target model.
Our method achieves a clean accuracy of 77.86% and a robust accuracy of 39.51% against AutoAttack.
- Score: 67.41281050467889
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The popularity of Machine Learning as a Service (MLaaS) has led to increased
concerns about Model Stealing Attacks (MSA), which aim to craft a clone model
by querying MLaaS. Currently, most research on MSA assumes that MLaaS can
provide soft labels and that the attacker has a proxy dataset with a similar
distribution. However, this fails to encapsulate the more practical scenario
where only hard labels are returned by MLaaS and the data distribution remains
elusive. Furthermore, most existing work focuses solely on stealing the model
accuracy, neglecting the model robustness, while robustness is essential in
security-sensitive scenarios, e.g., face-scan payment. Notably, improving model
robustness often necessitates the use of expensive techniques such as
adversarial training, thereby further making stealing robustness a more
lucrative prospect. In response to these identified gaps, we introduce a novel
Data-Free Hard-Label Robustness Stealing (DFHL-RS) attack in this paper, which
enables the stealing of both model accuracy and robustness by simply querying
hard labels of the target model without the help of any natural data.
Comprehensive experiments demonstrate the effectiveness of our method. The
clone model achieves a clean accuracy of 77.86% and a robust accuracy of 39.51%
against AutoAttack, which are only 4.71% and 8.40% lower than the target model
on the CIFAR-10 dataset, significantly exceeding the baselines. Our code is
available at: https://github.com/LetheSec/DFHL-RS-Attack.
Related papers
- CaBaGe: Data-Free Model Extraction using ClAss BAlanced Generator Ensemble [4.029642441688877]
We propose a data-free model extraction approach, CaBaGe, to achieve higher model extraction accuracy with a small number of queries.
Our evaluation shows that CaBaGe outperforms existing techniques on seven datasets.
arXiv Detail & Related papers (2024-09-16T18:19:19Z) - Label-Only Model Inversion Attacks via Knowledge Transfer [35.42380723970432]
In a model inversion (MI) attack, an adversary abuses access to a machine learning (ML) model to infer and reconstruct private data.
We propose LOKT, a novel approach for label-only MI attacks.
Our method significantly outperforms existing SOTA Label-only MI attack by more than 15% across all MI benchmarks.
arXiv Detail & Related papers (2023-10-30T08:32:12Z) - Beyond Labeling Oracles: What does it mean to steal ML models? [52.63413852460003]
Model extraction attacks are designed to steal trained models with only query access.
We investigate factors influencing the success of model extraction attacks.
Our findings urge the community to redefine the adversarial goals of ME attacks.
arXiv Detail & Related papers (2023-10-03T11:10:21Z) - Unstoppable Attack: Label-Only Model Inversion via Conditional Diffusion
Model [14.834360664780709]
Model attacks (MIAs) aim to recover private data from inaccessible training sets of deep learning models.
This paper develops a novel MIA method, leveraging a conditional diffusion model (CDM) to recover representative samples under the target label.
Experimental results show that this method can generate similar and accurate samples to the target label, outperforming generators of previous approaches.
arXiv Detail & Related papers (2023-07-17T12:14:24Z) - Towards Data-Free Model Stealing in a Hard Label Setting [41.92884427579068]
We show that it is possible to steal Machine Learning models by accessing only top-1 predictions.
We propose a novel GAN-based framework that trains the student and generator in tandem to steal the model.
arXiv Detail & Related papers (2022-04-23T08:44:51Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Black-Box Dissector: Towards Erasing-based Hard-Label Model Stealing
Attack [90.6076825117532]
Model stealing attack aims to create a substitute model that steals the ability of the victim target model.
Most of the existing methods depend on the full probability outputs from the victim model, which is unavailable in most realistic scenarios.
We propose a novel hard-label model stealing method termed emphblack-box dissector, which includes a CAM-driven erasing strategy to mine the hidden information in hard labels from the victim model.
arXiv Detail & Related papers (2021-05-03T04:12:31Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z) - Learnable Boundary Guided Adversarial Training [66.57846365425598]
We use the model logits from one clean model to guide learning of another one robust model.
We achieve new state-of-the-art robustness on CIFAR-100 without additional real or synthetic data.
arXiv Detail & Related papers (2020-11-23T01:36:05Z) - Label Smoothing and Adversarial Robustness [16.804200102767208]
We find that training model with label smoothing can easily achieve striking accuracy under most gradient-based attacks.
Our study enlightens the research community to rethink how to evaluate the model's robustness appropriately.
arXiv Detail & Related papers (2020-09-17T12:36:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.