Mitigating Adversarial Attacks by Distributing Different Copies to
Different Users
- URL: http://arxiv.org/abs/2111.15160v3
- Date: Fri, 26 May 2023 06:36:55 GMT
- Title: Mitigating Adversarial Attacks by Distributing Different Copies to
Different Users
- Authors: Jiyi Zhang, Han Fang, Wesley Joon-Wie Tann, Ke Xu, Chengfang Fang,
Ee-Chien Chang
- Abstract summary: We consider the scenario where a model is distributed to multiple buyers, among which a malicious buyer attempts to attack another buyer.
We propose a flexible parameter rewriting method that directly modifies the model's parameters.
Experimentation studies show that rewriting can significantly mitigate the attacks while retaining high classification accuracy.
- Score: 26.301784771724954
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning models are vulnerable to adversarial attacks. In this paper,
we consider the scenario where a model is distributed to multiple buyers, among
which a malicious buyer attempts to attack another buyer. The malicious buyer
probes its copy of the model to search for adversarial samples and then
presents the found samples to the victim's copy of the model in order to
replicate the attack. We point out that by distributing different copies of the
model to different buyers, we can mitigate the attack such that adversarial
samples found on one copy would not work on another copy. We observed that
training a model with different randomness indeed mitigates such replication to
a certain degree. However, there is no guarantee and retraining is
computationally expensive. A number of works extended the retraining method to
enhance the differences among models. However, a very limited number of models
can be produced using such methods and the computational cost becomes even
higher. Therefore, we propose a flexible parameter rewriting method that
directly modifies the model's parameters. This method does not require
additional training and is able to generate a large number of copies in a more
controllable manner, where each copy induces different adversarial regions.
Experimentation studies show that rewriting can significantly mitigate the
attacks while retaining high classification accuracy. For instance, on GTSRB
dataset with respect to Hop Skip Jump attack, using attractor-based rewriter
can reduce the success rate of replicating the attack to 0.5% while
independently training copies with different randomness can reduce the success
rate to 6.5%. From this study, we believe that there are many further
directions worth exploring.
Related papers
- Membership Inference Attacks on Diffusion Models via Quantile Regression [30.30033625685376]
We demonstrate a privacy vulnerability of diffusion models through amembership inference (MI) attack.
Our proposed MI attack learns quantile regression models that predict (a quantile of) the distribution of reconstruction loss on examples not used in training.
We show that our attack outperforms the prior state-of-the-art attack while being substantially less computationally expensive.
arXiv Detail & Related papers (2023-12-08T16:21:24Z) - One-bit Flip is All You Need: When Bit-flip Attack Meets Model Training [54.622474306336635]
A new weight modification attack called bit flip attack (BFA) was proposed, which exploits memory fault inject techniques.
We propose a training-assisted bit flip attack, in which the adversary is involved in the training stage to build a high-risk model to release.
arXiv Detail & Related papers (2023-08-12T09:34:43Z) - Adaptive Attractors: A Defense Strategy against ML Adversarial Collusion
Attacks [24.266782496653203]
A known approach achieves this using attractor-based rewriter which injects different attractors to different copies.
This induces different adversarial regions in different copies, making adversarial samples generated on one copy not replicable on others.
We propose using adaptive attractors whose weight is guided by a U-shape curve to cover the shortfalls.
arXiv Detail & Related papers (2023-06-02T09:46:54Z) - Tracing the Origin of Adversarial Attack for Forensic Investigation and
Deterrence [26.301784771724954]
Deep neural networks are vulnerable to adversarial attacks.
In this paper, we take the role of investigators who want to trace the attack and identify the source.
We propose a two-stage separate-and-trace framework.
arXiv Detail & Related papers (2022-12-31T01:38:02Z) - Are You Stealing My Model? Sample Correlation for Fingerprinting Deep
Neural Networks [86.55317144826179]
Previous methods always leverage the transferable adversarial examples as the model fingerprint.
We propose a novel yet simple model stealing detection method based on SAmple Correlation (SAC)
SAC successfully defends against various model stealing attacks, even including adversarial training or transfer learning.
arXiv Detail & Related papers (2022-10-21T02:07:50Z) - An Efficient Subpopulation-based Membership Inference Attack [11.172550334631921]
We introduce a fundamentally different MI attack approach which obviates the need to train hundreds of shadow models.
We achieve the state-of-the-art membership inference accuracy while significantly reducing the training cost.
arXiv Detail & Related papers (2022-03-04T00:52:06Z) - Defending against Model Stealing via Verifying Embedded External
Features [90.29429679125508]
adversaries can steal' deployed models even when they have no training samples and can not get access to the model parameters or structures.
We explore the defense from another angle by verifying whether a suspicious model contains the knowledge of defender-specified emphexternal features.
Our method is effective in detecting different types of model stealing simultaneously, even if the stolen model is obtained via a multi-stage stealing process.
arXiv Detail & Related papers (2021-12-07T03:51:54Z) - Learning to Attack: Towards Textual Adversarial Attacking in Real-world
Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples.
We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z) - Adversarial examples are useful too! [47.64219291655723]
I propose a new method to tell whether a model has been subject to a backdoor attack.
The idea is to generate adversarial examples, targeted or untargeted, using conventional attacks such as FGSM.
It is possible to visually locate the perturbed regions and unveil the attack.
arXiv Detail & Related papers (2020-05-13T01:38:56Z) - Adversarial Imitation Attack [63.76805962712481]
A practical adversarial attack should require as little as possible knowledge of attacked models.
Current substitute attacks need pre-trained models to generate adversarial examples.
In this study, we propose a novel adversarial imitation attack.
arXiv Detail & Related papers (2020-03-28T10:02:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.