Exacerbating Algorithmic Bias through Fairness Attacks
- URL: http://arxiv.org/abs/2012.08723v1
- Date: Wed, 16 Dec 2020 03:44:17 GMT
- Title: Exacerbating Algorithmic Bias through Fairness Attacks
- Authors: Ninareh Mehrabi, Muhammad Naveed, Fred Morstatter, Aram Galstyan
- Abstract summary: We propose new types of data poisoning attacks where an adversary intentionally targets the fairness of a system.
In the anchoring attack, we skew the decision boundary by placing poisoned points near specific target points to bias the outcome.
In the influence attack, we aim to maximize the covariance between the sensitive attributes and the decision outcome and affect the fairness of the model.
- Score: 16.367458747841255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Algorithmic fairness has attracted significant attention in recent years,
with many quantitative measures suggested for characterizing the fairness of
different machine learning algorithms. Despite this interest, the robustness of
those fairness measures with respect to an intentional adversarial attack has
not been properly addressed. Indeed, most adversarial machine learning has
focused on the impact of malicious attacks on the accuracy of the system,
without any regard to the system's fairness. We propose new types of data
poisoning attacks where an adversary intentionally targets the fairness of a
system. Specifically, we propose two families of attacks that target fairness
measures. In the anchoring attack, we skew the decision boundary by placing
poisoned points near specific target points to bias the outcome. In the
influence attack on fairness, we aim to maximize the covariance between the
sensitive attributes and the decision outcome and affect the fairness of the
model. We conduct extensive experiments that indicate the effectiveness of our
proposed attacks.
Related papers
- Indiscriminate Disruption of Conditional Inference on Multivariate Gaussians [60.22542847840578]
Despite advances in adversarial machine learning, inference for Gaussian models in the presence of an adversary is notably understudied.
We consider a self-interested attacker who wishes to disrupt a decisionmaker's conditional inference and subsequent actions by corrupting a set of evidentiary variables.
To avoid detection, the attacker also desires the attack to appear plausible wherein plausibility is determined by the density of the corrupted evidence.
arXiv Detail & Related papers (2024-11-21T17:46:55Z) - BadFair: Backdoored Fairness Attacks with Group-conditioned Triggers [11.406478357477292]
We introduce BadFair, a novel backdoored fairness attack methodology.
BadFair stealthily crafts a model that operates with accuracy and fairness under regular conditions but, when activated by certain triggers, discriminates and produces incorrect results for specific groups.
Our findings reveal that BadFair achieves a more than 85% attack success rate in attacks aimed at target groups on average while only incurring a minimal accuracy loss.
arXiv Detail & Related papers (2024-10-23T01:14:54Z) - Towards Fair Classification against Poisoning Attacks [52.57443558122475]
We study the poisoning scenario where the attacker can insert a small fraction of samples into training data.
We propose a general and theoretically guaranteed framework which accommodates traditional defense methods to fair classification against poisoning attacks.
arXiv Detail & Related papers (2022-10-18T00:49:58Z) - A Tale of HodgeRank and Spectral Method: Target Attack Against Rank
Aggregation Is the Fixed Point of Adversarial Game [153.74942025516853]
The intrinsic vulnerability of the rank aggregation methods is not well studied in the literature.
In this paper, we focus on the purposeful adversary who desires to designate the aggregated results by modifying the pairwise data.
The effectiveness of the suggested target attack strategies is demonstrated by a series of toy simulations and several real-world data experiments.
arXiv Detail & Related papers (2022-09-13T05:59:02Z) - Poisoning Attacks on Fair Machine Learning [13.874416271549523]
We present a framework that seeks to generate poisoning samples to attack both model accuracy and algorithmic fairness.
We develop three online attacks, adversarial sampling, adversarial labeling, and adversarial feature modification.
Our framework enables attackers to flexibly adjust the attack's focus on prediction accuracy or fairness and accurately quantify the impact of each candidate point to both accuracy loss and fairness violation.
arXiv Detail & Related papers (2021-10-17T21:56:14Z) - Identification of Attack-Specific Signatures in Adversarial Examples [62.17639067715379]
We show that different attack algorithms produce adversarial examples which are distinct not only in their effectiveness but also in how they qualitatively affect their victims.
Our findings suggest that prospective adversarial attacks should be compared not only via their success rates at fooling models but also via deeper downstream effects they have on victims.
arXiv Detail & Related papers (2021-10-13T15:40:48Z) - AdvMind: Inferring Adversary Intent of Black-Box Attacks [66.19339307119232]
We present AdvMind, a new class of estimation models that infer the adversary intent of black-box adversarial attacks in a robust manner.
On average AdvMind detects the adversary intent with over 75% accuracy after observing less than 3 query batches.
arXiv Detail & Related papers (2020-06-16T22:04:31Z) - Towards Robust Fine-grained Recognition by Maximal Separation of
Discriminative Features [72.72840552588134]
We identify the proximity of the latent representations of different classes in fine-grained recognition networks as a key factor to the success of adversarial attacks.
We introduce an attention-based regularization mechanism that maximally separates the discriminative latent features of different classes.
arXiv Detail & Related papers (2020-06-10T18:34:45Z) - Poisoning Attacks on Algorithmic Fairness [14.213638219685656]
We introduce an optimization framework for poisoning attacks against algorithmic fairness.
We develop a gradient-based poisoning attack aimed at introducing classification disparities among different groups in the data.
We believe that our findings pave the way towards the definition of an entirely novel set of adversarial attacks targeting algorithmic fairness in different scenarios.
arXiv Detail & Related papers (2020-04-15T08:07:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.