Towards Poisoning Fair Representations
- URL: http://arxiv.org/abs/2309.16487v2
- Date: Mon, 4 Mar 2024 19:03:44 GMT
- Title: Towards Poisoning Fair Representations
- Authors: Tianci Liu, Haoyu Wang, Feijie Wu, Hengtong Zhang, Pan Li, Lu Su, Jing
Gao
- Abstract summary: This work proposes the first data poisoning framework attacking fair representation learning methods.
We induce the model to output unfair representations that contain as much demographic information as possible by injecting carefully crafted poisoning samples into the training data.
Experiments on benchmark fairness datasets and state-of-the-art fair representation learning models demonstrate the superiority of our attack.
- Score: 26.47681999979761
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fair machine learning seeks to mitigate model prediction bias against certain
demographic subgroups such as elder and female. Recently, fair representation
learning (FRL) trained by deep neural networks has demonstrated superior
performance, whereby representations containing no demographic information are
inferred from the data and then used as the input to classification or other
downstream tasks. Despite the development of FRL methods, their vulnerability
under data poisoning attack, a popular protocol to benchmark model robustness
under adversarial scenarios, is under-explored. Data poisoning attacks have
been developed for classical fair machine learning methods which incorporate
fairness constraints into shallow-model classifiers. Nonetheless, these attacks
fall short in FRL due to notably different fairness goals and model
architectures. This work proposes the first data poisoning framework attacking
FRL. We induce the model to output unfair representations that contain as much
demographic information as possible by injecting carefully crafted poisoning
samples into the training data. This attack entails a prohibitive bilevel
optimization, wherefore an effective approximated solution is proposed. A
theoretical analysis on the needed number of poisoning samples is derived and
sheds light on defending against the attack. Experiments on benchmark fairness
datasets and state-of-the-art fair representation learning models demonstrate
the superiority of our attack.
Related papers
- EAB-FL: Exacerbating Algorithmic Bias through Model Poisoning Attacks in Federated Learning [3.699715556687871]
Federated Learning (FL) is a technique that allows multiple parties to train a shared model collaboratively without disclosing their private data.
FL models can suffer from biases against certain demographic groups due to the heterogeneity of data and party selection.
We propose a new type of model poisoning attack, EAB-FL, with a focus on exacerbating group unfairness while maintaining a good level of model utility.
arXiv Detail & Related papers (2024-10-02T21:22:48Z) - Defending Against Sophisticated Poisoning Attacks with RL-based Aggregation in Federated Learning [12.352511156767338]
Federated learning is highly susceptible to model poisoning attacks.
In this paper, we propose AdaAggRL, an RL-based Adaptive aggregation method.
Experiments on four real-world datasets demonstrate that the proposed defense model significantly outperforms widely adopted defense models for sophisticated attacks.
arXiv Detail & Related papers (2024-06-20T11:33:14Z) - Data-Agnostic Model Poisoning against Federated Learning: A Graph
Autoencoder Approach [65.2993866461477]
This paper proposes a data-agnostic, model poisoning attack on Federated Learning (FL)
The attack requires no knowledge of FL training data and achieves both effectiveness and undetectability.
Experiments show that the FL accuracy drops gradually under the proposed attack and existing defense mechanisms fail to detect it.
arXiv Detail & Related papers (2023-11-30T12:19:10Z) - When Fairness Meets Privacy: Exploring Privacy Threats in Fair Binary Classifiers via Membership Inference Attacks [17.243744418309593]
We propose an efficient MIA method against fairness-enhanced models based on fairness discrepancy results.
We also explore potential strategies for mitigating privacy leakages.
arXiv Detail & Related papers (2023-11-07T10:28:17Z) - Learning for Counterfactual Fairness from Observational Data [62.43249746968616]
Fairness-aware machine learning aims to eliminate biases of learning models against certain subgroups described by certain protected (sensitive) attributes such as race, gender, and age.
A prerequisite for existing methods to achieve counterfactual fairness is the prior human knowledge of the causal model for the data.
In this work, we address the problem of counterfactually fair prediction from observational data without given causal models by proposing a novel framework CLAIRE.
arXiv Detail & Related papers (2023-07-17T04:08:29Z) - Membership Inference Attacks against Language Models via Neighbourhood
Comparison [45.086816556309266]
Membership Inference attacks (MIAs) aim to predict whether a data sample was present in the training data of a machine learning model or not.
Recent work has demonstrated that reference-based attacks which compare model scores to those obtained from a reference model trained on similar data can substantially improve the performance of MIAs.
We investigate their performance in more realistic scenarios and find that they are highly fragile in relation to the data distribution used to train reference models.
arXiv Detail & Related papers (2023-05-29T07:06:03Z) - Revealing Unfair Models by Mining Interpretable Evidence [50.48264727620845]
The popularity of machine learning has increased the risk of unfair models getting deployed in high-stake applications.
In this paper, we tackle the novel task of revealing unfair models by mining interpretable evidence.
Our method finds highly interpretable and solid evidence to effectively reveal the unfairness of trained models.
arXiv Detail & Related papers (2022-07-12T20:03:08Z) - FairIF: Boosting Fairness in Deep Learning via Influence Functions with
Validation Set Sensitive Attributes [51.02407217197623]
We propose a two-stage training algorithm named FAIRIF.
It minimizes the loss over the reweighted data set where the sample weights are computed.
We show that FAIRIF yields models with better fairness-utility trade-offs against various types of bias.
arXiv Detail & Related papers (2022-01-15T05:14:48Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z) - Learning to Attack: Towards Textual Adversarial Attacking in Real-world
Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples.
We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z) - FR-Train: A Mutual Information-Based Approach to Fair and Robust
Training [33.385118640843416]
We propose FR-Train, which holistically performs fair and robust model training.
In our experiments, FR-Train shows almost no decrease in fairness and accuracy in the presence of data poisoning.
arXiv Detail & Related papers (2020-02-24T13:37:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.