BadFair: Backdoored Fairness Attacks with Group-conditioned Triggers
- URL: http://arxiv.org/abs/2410.17492v1
- Date: Wed, 23 Oct 2024 01:14:54 GMT
- Title: BadFair: Backdoored Fairness Attacks with Group-conditioned Triggers
- Authors: Jiaqi Xue, Qian Lou, Mengxin Zheng,
- Abstract summary: We introduce BadFair, a novel backdoored fairness attack methodology.
BadFair stealthily crafts a model that operates with accuracy and fairness under regular conditions but, when activated by certain triggers, discriminates and produces incorrect results for specific groups.
Our findings reveal that BadFair achieves a more than 85% attack success rate in attacks aimed at target groups on average while only incurring a minimal accuracy loss.
- Score: 11.406478357477292
- License:
- Abstract: Attacking fairness is crucial because compromised models can introduce biased outcomes, undermining trust and amplifying inequalities in sensitive applications like hiring, healthcare, and law enforcement. This highlights the urgent need to understand how fairness mechanisms can be exploited and to develop defenses that ensure both fairness and robustness. We introduce BadFair, a novel backdoored fairness attack methodology. BadFair stealthily crafts a model that operates with accuracy and fairness under regular conditions but, when activated by certain triggers, discriminates and produces incorrect results for specific groups. This type of attack is particularly stealthy and dangerous, as it circumvents existing fairness detection methods, maintaining an appearance of fairness in normal use. Our findings reveal that BadFair achieves a more than 85% attack success rate in attacks aimed at target groups on average while only incurring a minimal accuracy loss. Moreover, it consistently exhibits a significant discrimination score, distinguishing between pre-defined target and non-target attacked groups across various datasets and models.
Related papers
- PFAttack: Stealthy Attack Bypassing Group Fairness in Federated Learning [24.746843739848003]
Federated learning (FL) allows clients to collaboratively train a global model that makes unbiased decisions for different populations.
Previous studies have demonstrated that FL systems are vulnerable to model poisoning attacks.
We propose Profit-driven Fairness Attack (PFATTACK) which aims not to degrade global model accuracy but to bypass fairness mechanisms.
arXiv Detail & Related papers (2024-10-09T03:23:07Z) - TrojFair: Trojan Fairness Attacks [14.677100524907358]
TrojFair is a stealthy Fairness attack that is resilient to existing model fairness audition detectors.
It achieves a target group attack success rate exceeding $88.77%$, with an average accuracy loss less than 0.44%$.
It also maintains a high discriminative score between the target and non-target groups across various datasets and models.
arXiv Detail & Related papers (2023-12-16T17:36:23Z) - Causal Context Connects Counterfactual Fairness to Robust Prediction and
Group Fairness [15.83823345486604]
We motivatefactual fairness by showing that there is not a fundamental trade-off between fairness and accuracy.
Counterfactual fairness can sometimes be tested by measuring relatively simple group fairness metrics.
arXiv Detail & Related papers (2023-10-30T16:07:57Z) - Learning for Counterfactual Fairness from Observational Data [62.43249746968616]
Fairness-aware machine learning aims to eliminate biases of learning models against certain subgroups described by certain protected (sensitive) attributes such as race, gender, and age.
A prerequisite for existing methods to achieve counterfactual fairness is the prior human knowledge of the causal model for the data.
In this work, we address the problem of counterfactually fair prediction from observational data without given causal models by proposing a novel framework CLAIRE.
arXiv Detail & Related papers (2023-07-17T04:08:29Z) - RobustFair: Adversarial Evaluation through Fairness Confusion Directed
Gradient Search [8.278129731168127]
Deep neural networks (DNNs) often face challenges due to their vulnerability to various adversarial perturbations.
This paper introduces a novel approach, RobustFair, to evaluate the accurate fairness of DNNs when subjected to false or biased perturbations.
arXiv Detail & Related papers (2023-05-18T12:07:29Z) - Fair-CDA: Continuous and Directional Augmentation for Group Fairness [48.84385689186208]
We propose a fine-grained data augmentation strategy for imposing fairness constraints.
We show that group fairness can be achieved by regularizing the models on transition paths of sensitive features between groups.
Our proposed method does not assume any data generative model and ensures good generalization for both accuracy and fairness.
arXiv Detail & Related papers (2023-04-01T11:23:00Z) - DualFair: Fair Representation Learning at Both Group and Individual
Levels via Contrastive Self-supervision [73.80009454050858]
This work presents a self-supervised model, called DualFair, that can debias sensitive attributes like gender and race from learned representations.
Our model jointly optimize for two fairness criteria - group fairness and counterfactual fairness.
arXiv Detail & Related papers (2023-03-15T07:13:54Z) - Towards Fair Classification against Poisoning Attacks [52.57443558122475]
We study the poisoning scenario where the attacker can insert a small fraction of samples into training data.
We propose a general and theoretically guaranteed framework which accommodates traditional defense methods to fair classification against poisoning attacks.
arXiv Detail & Related papers (2022-10-18T00:49:58Z) - Improving Robust Fairness via Balance Adversarial Training [51.67643171193376]
Adversarial training (AT) methods are effective against adversarial attacks, yet they introduce severe disparity of accuracy and robustness between different classes.
We propose Adversarial Training (BAT) to address the robust fairness problem.
arXiv Detail & Related papers (2022-09-15T14:44:48Z) - Optimising Equal Opportunity Fairness in Model Training [60.0947291284978]
Existing debiasing methods, such as adversarial training and removing protected information from representations, have been shown to reduce bias.
We propose two novel training objectives which directly optimise for the widely-used criterion of it equal opportunity, and show that they are effective in reducing bias while maintaining high performance over two classification tasks.
arXiv Detail & Related papers (2022-05-05T01:57:58Z) - Exacerbating Algorithmic Bias through Fairness Attacks [16.367458747841255]
We propose new types of data poisoning attacks where an adversary intentionally targets the fairness of a system.
In the anchoring attack, we skew the decision boundary by placing poisoned points near specific target points to bias the outcome.
In the influence attack, we aim to maximize the covariance between the sensitive attributes and the decision outcome and affect the fairness of the model.
arXiv Detail & Related papers (2020-12-16T03:44:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.