On the Interplay between Fairness and Explainability
- URL: http://arxiv.org/abs/2310.16607v2
- Date: Mon, 13 Nov 2023 15:20:43 GMT
- Title: On the Interplay between Fairness and Explainability
- Authors: Stephanie Brandl, Emanuele Bugliarello, Ilias Chalkidis
- Abstract summary: We perform a first study to understand how fairness and explainability influence each other.
We fine-tune pre-trained language models with several methods for bias mitigation.
We find that bias mitigation algorithms do not always lead to fairer models.
- Score: 28.37896468795247
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In order to build reliable and trustworthy NLP applications, models need to
be both fair across different demographics and explainable. Usually these two
objectives, fairness and explainability, are optimized and/or examined
independently of each other. Instead, we argue that forthcoming, trustworthy
NLP systems should consider both. In this work, we perform a first study to
understand how they influence each other: do fair(er) models rely on more
plausible rationales? and vice versa. To this end, we conduct experiments on
two English multi-class text classification datasets, BIOS and ECtHR, that
provide information on gender and nationality, respectively, as well as
human-annotated rationales. We fine-tune pre-trained language models with
several methods for (i) bias mitigation, which aims to improve fairness; (ii)
rationale extraction, which aims to produce plausible explanations. We find
that bias mitigation algorithms do not always lead to fairer models. Moreover,
we discover that empirical fairness and explainability are orthogonal.
Related papers
- Evaluating Consistency and Reasoning Capabilities of Large Language Models [0.0]
Large Language Models (LLMs) are extensively used today across various sectors, including academia, research, business, and finance.
Despite their widespread adoption, these models often produce incorrect and misleading information, exhibiting a tendency to hallucinate.
This paper aims to evaluate and compare the consistency and reasoning capabilities of both public and proprietary LLMs.
arXiv Detail & Related papers (2024-04-25T10:03:14Z) - Fairness Explainability using Optimal Transport with Applications in
Image Classification [0.46040036610482665]
We propose a comprehensive approach to uncover the causes of discrimination in Machine Learning applications.
We leverage Wasserstein barycenters to achieve fair predictions and introduce an extension to pinpoint bias-associated regions.
This allows us to derive a cohesive system which uses the enforced fairness to measure each features influence emphon the bias.
arXiv Detail & Related papers (2023-08-22T00:10:23Z) - DualFair: Fair Representation Learning at Both Group and Individual
Levels via Contrastive Self-supervision [73.80009454050858]
This work presents a self-supervised model, called DualFair, that can debias sensitive attributes like gender and race from learned representations.
Our model jointly optimize for two fairness criteria - group fairness and counterfactual fairness.
arXiv Detail & Related papers (2023-03-15T07:13:54Z) - Fair Enough: Standardizing Evaluation and Model Selection for Fairness
Research in NLP [64.45845091719002]
Modern NLP systems exhibit a range of biases, which a growing literature on model debiasing attempts to correct.
This paper seeks to clarify the current situation and plot a course for meaningful progress in fair learning.
arXiv Detail & Related papers (2023-02-11T14:54:00Z) - Fairness and Explainability: Bridging the Gap Towards Fair Model
Explanations [12.248793742165278]
We bridge the gap between fairness and explainability by presenting a novel perspective of procedure-oriented fairness based on explanations.
We propose a Comprehensive Fairness Algorithm (CFA), which simultaneously fulfills multiple objectives - improving traditional fairness, satisfying explanation fairness, and maintaining the utility performance.
arXiv Detail & Related papers (2022-12-07T18:35:54Z) - Bridging the Data Gap between Training and Inference for Unsupervised
Neural Machine Translation [49.916963624249355]
A UNMT model is trained on the pseudo parallel data with translated source, and natural source sentences in inference.
The source discrepancy between training and inference hinders the translation performance of UNMT models.
We propose an online self-training approach, which simultaneously uses the pseudo parallel data natural source, translated target to mimic the inference scenario.
arXiv Detail & Related papers (2022-03-16T04:50:27Z) - Prompting Contrastive Explanations for Commonsense Reasoning Tasks [74.7346558082693]
Large pretrained language models (PLMs) can achieve near-human performance on commonsense reasoning tasks.
We show how to use these same models to generate human-interpretable evidence.
arXiv Detail & Related papers (2021-06-12T17:06:13Z) - Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models.
Our method is based on projecting model representation to a latent space.
Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z) - FAIR: Fair Adversarial Instance Re-weighting [0.7829352305480285]
We propose a Fair Adrial Instance Re-weighting (FAIR) method, which uses adversarial training to learn instance weighting function that ensures fair predictions.
To the best of our knowledge, this is the first model that merges reweighting and adversarial approaches by means of a weighting function that can provide interpretable information about fairness of individual instances.
arXiv Detail & Related papers (2020-11-15T10:48:56Z) - Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial
Explanations of Their Behavior in Natural Language? [86.60613602337246]
We introduce a leakage-adjusted simulatability (LAS) metric for evaluating NL explanations.
LAS measures how well explanations help an observer predict a model's output, while controlling for how explanations can directly leak the output.
We frame explanation generation as a multi-agent game and optimize explanations for simulatability while penalizing label leakage.
arXiv Detail & Related papers (2020-10-08T16:59:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.