Adaptive Attractors: A Defense Strategy against ML Adversarial Collusion
Attacks
- URL: http://arxiv.org/abs/2306.01400v1
- Date: Fri, 2 Jun 2023 09:46:54 GMT
- Title: Adaptive Attractors: A Defense Strategy against ML Adversarial Collusion
Attacks
- Authors: Jiyi Zhang, Han Fang, Ee-Chien Chang
- Abstract summary: A known approach achieves this using attractor-based rewriter which injects different attractors to different copies.
This induces different adversarial regions in different copies, making adversarial samples generated on one copy not replicable on others.
We propose using adaptive attractors whose weight is guided by a U-shape curve to cover the shortfalls.
- Score: 24.266782496653203
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the seller-buyer setting on machine learning models, the seller generates
different copies based on the original model and distributes them to different
buyers, such that adversarial samples generated on one buyer's copy would
likely not work on other copies. A known approach achieves this using
attractor-based rewriter which injects different attractors to different
copies. This induces different adversarial regions in different copies, making
adversarial samples generated on one copy not replicable on others. In this
paper, we focus on a scenario where multiple malicious buyers collude to
attack. We first give two formulations and conduct empirical studies to analyze
effectiveness of collusion attack under different assumptions on the attacker's
capabilities and properties of the attractors. We observe that existing
attractor-based methods do not effectively mislead the colluders in the sense
that adversarial samples found are influenced more by the original model
instead of the attractors as number of colluders increases. Based on this
observation, we propose using adaptive attractors whose weight is guided by a
U-shape curve to cover the shortfalls. Experimentation results show that when
using our approach, the attack success rate of a collusion attack converges to
around 15% even when lots of copies are applied for collusion. In contrast,
when using the existing attractor-based rewriter with fixed weight, the attack
success rate increases linearly with the number of copies used for collusion.
Related papers
- Forging the Forger: An Attempt to Improve Authorship Verification via Data Augmentation [52.72682366640554]
Authorship Verification (AV) is a text classification task concerned with inferring whether a candidate text has been written by one specific author or by someone else.
It has been shown that many AV systems are vulnerable to adversarial attacks, where a malicious author actively tries to fool the classifier by either concealing their writing style, or by imitating the style of another author.
arXiv Detail & Related papers (2024-03-17T16:36:26Z) - DALA: A Distribution-Aware LoRA-Based Adversarial Attack against
Language Models [64.79319733514266]
Adversarial attacks can introduce subtle perturbations to input data.
Recent attack methods can achieve a relatively high attack success rate (ASR)
We propose a Distribution-Aware LoRA-based Adversarial Attack (DALA) method.
arXiv Detail & Related papers (2023-11-14T23:43:47Z) - PRAT: PRofiling Adversarial aTtacks [52.693011665938734]
We introduce a novel problem of PRofiling Adversarial aTtacks (PRAT)
Given an adversarial example, the objective of PRAT is to identify the attack used to generate it.
We use AID to devise a novel framework for the PRAT objective.
arXiv Detail & Related papers (2023-09-20T07:42:51Z) - Tracing the Origin of Adversarial Attack for Forensic Investigation and
Deterrence [26.301784771724954]
Deep neural networks are vulnerable to adversarial attacks.
In this paper, we take the role of investigators who want to trace the attack and identify the source.
We propose a two-stage separate-and-trace framework.
arXiv Detail & Related papers (2022-12-31T01:38:02Z) - Transferability Ranking of Adversarial Examples [20.41013432717447]
This paper introduces a ranking strategy that refines the transfer attack process.
By leveraging a set of diverse surrogate models, our method can predict transferability of adversarial examples.
Using our strategy, we were able to raise the transferability of adversarial examples from a mere 20% - akin to random selection-up to near upper-bound levels.
arXiv Detail & Related papers (2022-08-23T11:25:16Z) - Mitigating Adversarial Attacks by Distributing Different Copies to
Different Users [26.301784771724954]
We consider the scenario where a model is distributed to multiple buyers, among which a malicious buyer attempts to attack another buyer.
We propose a flexible parameter rewriting method that directly modifies the model's parameters.
Experimentation studies show that rewriting can significantly mitigate the attacks while retaining high classification accuracy.
arXiv Detail & Related papers (2021-11-30T06:35:36Z) - Identification of Attack-Specific Signatures in Adversarial Examples [62.17639067715379]
We show that different attack algorithms produce adversarial examples which are distinct not only in their effectiveness but also in how they qualitatively affect their victims.
Our findings suggest that prospective adversarial attacks should be compared not only via their success rates at fooling models but also via deeper downstream effects they have on victims.
arXiv Detail & Related papers (2021-10-13T15:40:48Z) - Multi-granularity Textual Adversarial Attack with Behavior Cloning [4.727534308759158]
We propose MAYA, a Multi-grAnularitY Attack model to generate high-quality adversarial samples with fewer queries to victim models.
We conduct comprehensive experiments to evaluate our attack models by attacking BiLSTM, BERT and RoBERTa in two different black-box attack settings and three benchmark datasets.
arXiv Detail & Related papers (2021-09-09T15:46:45Z) - Learning to Separate Clusters of Adversarial Representations for Robust
Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature.
In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property.
This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z) - Adversarial Example Games [51.92698856933169]
Adrial Example Games (AEG) is a framework that models the crafting of adversarial examples.
AEG provides a new way to design adversarial examples by adversarially training a generator and aversa from a given hypothesis class.
We demonstrate the efficacy of AEG on the MNIST and CIFAR-10 datasets.
arXiv Detail & Related papers (2020-07-01T19:47:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.