Automatic Fairness Testing of Neural Classifiers through Adversarial
Sampling
- URL: http://arxiv.org/abs/2107.08176v1
- Date: Sat, 17 Jul 2021 03:47:08 GMT
- Title: Automatic Fairness Testing of Neural Classifiers through Adversarial
Sampling
- Authors: Peixin Zhang, Jingyi Wang, Jun Sun, Xinyu Wang, Guoliang Dong, Xingen
Wang, Ting Dai, Jin Song Dong
- Abstract summary: We propose a scalable and effective approach for systematically searching for discriminative samples.
Compared with state-of-the-art methods, our approach only employs lightweight procedures like gradient computation and clustering.
The retrained models reduce discrimination by 57.2% and 60.2% respectively on average.
- Score: 8.2868128804393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although deep learning has demonstrated astonishing performance in many
applications, there are still concerns on their dependability. One desirable
property of deep learning applications with societal impact is fairness (i.e.,
non-discrimination). Unfortunately, discrimination might be intrinsically
embedded into the models due to discrimination in the training data. As a
countermeasure, fairness testing systemically identifies discriminative
samples, which can be used to retrain the model and improve its fairness.
Existing fairness testing approaches however have two major limitations. First,
they only work well on traditional machine learning models and have poor
performance (e.g., effectiveness and efficiency) on deep learning models.
Second, they only work on simple tabular data and are not applicable for
domains such as text. In this work, we bridge the gap by proposing a scalable
and effective approach for systematically searching for discriminative samples
while extending fairness testing to address a challenging domain, i.e., text
classification. Compared with state-of-the-art methods, our approach only
employs lightweight procedures like gradient computation and clustering, which
makes it significantly more scalable. Experimental results show that on
average, our approach explores the search space more effectively (9.62 and 2.38
times more than the state-of-art methods respectively on tabular and text
datasets) and generates much more individual discriminatory instances (24.95
and 2.68 times) within reasonable time. The retrained models reduce
discrimination by 57.2% and 60.2% respectively on average.
Related papers
- Canary in a Coalmine: Better Membership Inference with Ensembled
Adversarial Queries [53.222218035435006]
We use adversarial tools to optimize for queries that are discriminative and diverse.
Our improvements achieve significantly more accurate membership inference than existing methods.
arXiv Detail & Related papers (2022-10-19T17:46:50Z) - Revealing Unfair Models by Mining Interpretable Evidence [50.48264727620845]
The popularity of machine learning has increased the risk of unfair models getting deployed in high-stake applications.
In this paper, we tackle the novel task of revealing unfair models by mining interpretable evidence.
Our method finds highly interpretable and solid evidence to effectively reveal the unfairness of trained models.
arXiv Detail & Related papers (2022-07-12T20:03:08Z) - FairIF: Boosting Fairness in Deep Learning via Influence Functions with
Validation Set Sensitive Attributes [51.02407217197623]
We propose a two-stage training algorithm named FAIRIF.
It minimizes the loss over the reweighted data set where the sample weights are computed.
We show that FAIRIF yields models with better fairness-utility trade-offs against various types of bias.
arXiv Detail & Related papers (2022-01-15T05:14:48Z) - Contrastive Learning for Fair Representations [50.95604482330149]
Trained classification models can unintentionally lead to biased representations and predictions.
Existing debiasing methods for classification models, such as adversarial training, are often expensive to train and difficult to optimise.
We propose a method for mitigating bias by incorporating contrastive learning, in which instances sharing the same class label are encouraged to have similar representations.
arXiv Detail & Related papers (2021-09-22T10:47:51Z) - Robust Fairness-aware Learning Under Sample Selection Bias [17.09665420515772]
We propose a framework for robust and fair learning under sample selection bias.
We develop two algorithms to handle sample selection bias when test data is both available and unavailable.
arXiv Detail & Related papers (2021-05-24T23:23:36Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z) - Metrics and methods for a systematic comparison of fairness-aware
machine learning algorithms [0.0]
This study is the most comprehensive of its kind.
It considers fairness, predictive-performance, calibration quality, and speed of 28 different modelling pipelines.
We also found that fairness-aware algorithms can induce fairness without material drops in predictive power.
arXiv Detail & Related papers (2020-10-08T13:58:09Z) - Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce
Discrimination [53.3082498402884]
A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair.
We present a framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data.
A theoretical decomposition analysis of bias, variance and noise highlights the different sources of discrimination and the impact they have on fairness in semi-supervised learning.
arXiv Detail & Related papers (2020-09-25T05:48:56Z) - Do the Machine Learning Models on a Crowd Sourced Platform Exhibit Bias?
An Empirical Study on Model Fairness [7.673007415383724]
We have created a benchmark of 40 top-rated models from Kaggle used for 5 different tasks.
We have applied 7 mitigation techniques on these models and analyzed the fairness, mitigation results, and impacts on performance.
arXiv Detail & Related papers (2020-05-21T23:35:53Z) - Fairness-Aware Learning with Prejudice Free Representations [2.398608007786179]
We propose a novel algorithm that can effectively identify and treat latent discriminating features.
The approach helps to collect discrimination-free features that would improve the model performance.
arXiv Detail & Related papers (2020-02-26T10:06:31Z) - Learning from Discriminatory Training Data [2.1869017389979266]
Supervised learning systems are trained using historical data and, if the data was tainted by discrimination, they may unintentionally learn to discriminate against protected groups.
We propose that fair learning methods, despite training on potentially discriminatory datasets, shall perform well on fair test datasets.
arXiv Detail & Related papers (2019-12-17T18:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.