Towards Fair Classification against Poisoning Attacks
- URL: http://arxiv.org/abs/2210.09503v1
- Date: Tue, 18 Oct 2022 00:49:58 GMT
- Title: Towards Fair Classification against Poisoning Attacks
- Authors: Han Xu, Xiaorui Liu, Yuxuan Wan, Jiliang Tang
- Abstract summary: We study the poisoning scenario where the attacker can insert a small fraction of samples into training data.
We propose a general and theoretically guaranteed framework which accommodates traditional defense methods to fair classification against poisoning attacks.
- Score: 52.57443558122475
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fair classification aims to stress the classification models to achieve the
equality (treatment or prediction quality) among different sensitive groups.
However, fair classification can be under the risk of poisoning attacks that
deliberately insert malicious training samples to manipulate the trained
classifiers' performance. In this work, we study the poisoning scenario where
the attacker can insert a small fraction of samples into training data, with
arbitrary sensitive attributes as well as other predictive features. We
demonstrate that the fairly trained classifiers can be greatly vulnerable to
such poisoning attacks, with much worse accuracy & fairness trade-off, even
when we apply some of the most effective defenses (originally proposed to
defend traditional classification tasks). As countermeasures to defend fair
classification tasks, we propose a general and theoretically guaranteed
framework which accommodates traditional defense methods to fair classification
against poisoning attacks. Through extensive experiments, the results validate
that the proposed defense framework obtains better robustness in terms of
accuracy and fairness than representative baseline methods.
Related papers
- FCert: Certifiably Robust Few-Shot Classification in the Era of Foundation Models [38.019489232264796]
We propose FCert, the first certified defense against data poisoning attacks to few-shot classification.
Our experimental results show our FCert: 1) maintains classification accuracy without attacks, 2) outperforms existing certified defenses for data poisoning attacks, and 3) is efficient and general.
arXiv Detail & Related papers (2024-04-12T17:50:40Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - Poisoning Attacks on Fair Machine Learning [13.874416271549523]
We present a framework that seeks to generate poisoning samples to attack both model accuracy and algorithmic fairness.
We develop three online attacks, adversarial sampling, adversarial labeling, and adversarial feature modification.
Our framework enables attackers to flexibly adjust the attack's focus on prediction accuracy or fairness and accurately quantify the impact of each candidate point to both accuracy loss and fairness violation.
arXiv Detail & Related papers (2021-10-17T21:56:14Z) - Robustness May Be at Odds with Fairness: An Empirical Study on
Class-wise Accuracy [85.20742045853738]
CNNs are widely known to be vulnerable to adversarial attacks.
We propose an empirical study on the class-wise accuracy and robustness of adversarially trained models.
We find that there exists inter-class discrepancy for accuracy and robustness even when the training dataset has an equal number of samples for each class.
arXiv Detail & Related papers (2020-10-26T06:32:32Z) - Are Adversarial Examples Created Equal? A Learnable Weighted Minimax
Risk for Robustness under Non-uniform Attacks [70.11599738647963]
Adversarial Training is one of the few defenses that withstand strong attacks.
Traditional defense mechanisms assume a uniform attack over the examples according to the underlying data distribution.
We present a weighted minimax risk optimization that defends against non-uniform attacks.
arXiv Detail & Related papers (2020-10-24T21:20:35Z) - ATRO: Adversarial Training with a Rejection Option [10.36668157679368]
This paper proposes a classification framework with a rejection option to mitigate the performance deterioration caused by adversarial examples.
Applying the adversarial training objective to both a classifier and a rejection function simultaneously, we can choose to abstain from classification when it has insufficient confidence to classify a test data point.
arXiv Detail & Related papers (2020-10-24T14:05:03Z) - A Framework of Randomized Selection Based Certified Defenses Against
Data Poisoning Attacks [28.593598534525267]
This paper proposes a framework of random selection based certified defenses against data poisoning attacks.
We prove that the random selection schemes that satisfy certain conditions are robust against data poisoning attacks.
Our framework allows users to improve robustness by leveraging prior knowledge about the training set and the poisoning model.
arXiv Detail & Related papers (2020-09-18T10:38:12Z) - Towards Robust Fine-grained Recognition by Maximal Separation of
Discriminative Features [72.72840552588134]
We identify the proximity of the latent representations of different classes in fine-grained recognition networks as a key factor to the success of adversarial attacks.
We introduce an attention-based regularization mechanism that maximally separates the discriminative latent features of different classes.
arXiv Detail & Related papers (2020-06-10T18:34:45Z) - Protecting Classifiers From Attacks. A Bayesian Approach [0.9449650062296823]
We provide an alternative Bayesian framework that accounts for the lack of precise knowledge about the attacker's behavior using adversarial risk analysis.
We propose a sampling procedure based on approximate Bayesian computation, in which we simulate the attacker's problem taking into account our uncertainty about his elements.
For large scale problems, we propose an alternative, scalable approach that could be used when dealing with differentiable classifiers.
arXiv Detail & Related papers (2020-04-18T21:21:56Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.