Poisoning Attacks on Algorithmic Fairness
- URL: http://arxiv.org/abs/2004.07401v3
- Date: Fri, 26 Jun 2020 08:17:44 GMT
- Title: Poisoning Attacks on Algorithmic Fairness
- Authors: David Solans, Battista Biggio, Carlos Castillo
- Abstract summary: We introduce an optimization framework for poisoning attacks against algorithmic fairness.
We develop a gradient-based poisoning attack aimed at introducing classification disparities among different groups in the data.
We believe that our findings pave the way towards the definition of an entirely novel set of adversarial attacks targeting algorithmic fairness in different scenarios.
- Score: 14.213638219685656
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Research in adversarial machine learning has shown how the performance of
machine learning models can be seriously compromised by injecting even a small
fraction of poisoning points into the training data. While the effects on model
accuracy of such poisoning attacks have been widely studied, their potential
effects on other model performance metrics remain to be evaluated. In this
work, we introduce an optimization framework for poisoning attacks against
algorithmic fairness, and develop a gradient-based poisoning attack aimed at
introducing classification disparities among different groups in the data. We
empirically show that our attack is effective not only in the white-box
setting, in which the attacker has full access to the target model, but also in
a more challenging black-box scenario in which the attacks are optimized
against a substitute model and then transferred to the target model. We believe
that our findings pave the way towards the definition of an entirely novel set
of adversarial attacks targeting algorithmic fairness in different scenarios,
and that investigating such vulnerabilities will help design more robust
algorithms and countermeasures in the future.
Related papers
- Universal Distributional Decision-based Black-box Adversarial Attack
with Reinforcement Learning [5.240772699480865]
We propose a pixel-wise decision-based attack algorithm that finds a distribution of adversarial perturbation through a reinforcement learning algorithm.
Experiments show that the proposed approach outperforms state-of-the-art decision-based attacks with a higher attack success rate and greater transferability.
arXiv Detail & Related papers (2022-11-15T18:30:18Z) - Membership Inference Attacks by Exploiting Loss Trajectory [19.900473800648243]
We propose a new attack method, called system, which can exploit the membership information from the whole training process of the target model.
Our attack achieves at least 6$times$ higher true-positive rate at a low false-positive rate of 0.1% than existing methods.
arXiv Detail & Related papers (2022-08-31T16:02:26Z) - Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks.
We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z) - SparseFed: Mitigating Model Poisoning Attacks in Federated Learning with
Sparsification [24.053704318868043]
In model poisoning attacks, the attacker reduces the model's performance on targeted sub-tasks by uploading "poisoned" updates.
We introduce algoname, a novel defense that uses global top-k update sparsification and device-level clipping gradient to mitigate model poisoning attacks.
arXiv Detail & Related papers (2021-12-12T16:34:52Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial
Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically.
Our method learns the in adversarial attacks parameterized by a recurrent neural network.
We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z) - Delving into Data: Effectively Substitute Training for Black-box Attack [84.85798059317963]
We propose a novel perspective substitute training that focuses on designing the distribution of data used in the knowledge stealing process.
The combination of these two modules can further boost the consistency of the substitute model and target model, which greatly improves the effectiveness of adversarial attack.
arXiv Detail & Related papers (2021-04-26T07:26:29Z) - Adversarial Poisoning Attacks and Defense for General Multi-Class Models
Based On Synthetic Reduced Nearest Neighbors [14.968442560499753]
State-of-the-art machine learning models are vulnerable to data poisoning attacks.
This paper proposes a novel model-free label-flipping attack based on the multi-modality of the data.
Second, a novel defense technique based on the Synthetic Reduced Nearest Neighbor (SRNN) model is proposed.
arXiv Detail & Related papers (2021-02-11T06:55:40Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z) - Learning to Attack: Towards Textual Adversarial Attacking in Real-world
Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples.
We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z) - Towards Class-Oriented Poisoning Attacks Against Neural Networks [1.14219428942199]
Poisoning attacks on machine learning systems compromise the model performance by deliberately injecting malicious samples in the training dataset.
We propose a class-oriented poisoning attack that is capable of forcing the corrupted model to predict in two specific ways.
To maximize the adversarial effect as well as reduce the computational complexity of poisoned data generation, we propose a gradient-based framework.
arXiv Detail & Related papers (2020-07-31T19:27:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.