Related papers: Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods

Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods

URL: http://arxiv.org/abs/2505.01198v1
Date: Fri, 02 May 2025 11:41:25 GMT
Title: Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods
Authors: Mahdi Dhaini, Ege Erdogan, Nils Feldhus, Gjergji Kasneci,
Abstract summary: We show that post-hoc feature attribution methods exhibit significant gender disparity with respect to their faithfulness, robustness, and complexity.<n>Our results highlight the importance of addressing disparities in explanations when developing and applying explainability methods.
Score: 11.754326620700283
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While research on applications and evaluations of explanation methods continues to expand, fairness of the explanation methods concerning disparities in their performance across subgroups remains an often overlooked aspect. In this paper, we address this gap by showing that, across three tasks and five language models, widely used post-hoc feature attribution methods exhibit significant gender disparity with respect to their faithfulness, robustness, and complexity. These disparities persist even when the models are pre-trained or fine-tuned on particularly unbiased datasets, indicating that the disparities we observe are not merely consequences of biased training data. Our results highlight the importance of addressing disparities in explanations when developing and applying explainability methods, as these can lead to biased outcomes against certain subgroups, with particularly critical implications in high-stakes contexts. Furthermore, our findings underscore the importance of incorporating the fairness of explanations, alongside overall model fairness and explainability, as a requirement in regulatory frameworks.

Related papers

Rethinking Fair Representation Learning for Performance-Sensitive Tasks [19.40265690963578]
We use causal reasoning to define and formalise different sources of dataset bias.<n>We run experiments across a range of medical modalities to examine the performance of fair representation learning under distribution shifts.
arXiv Detail & Related papers (2024-10-05T11:01:16Z)
Identifiable Latent Neural Causal Models [82.14087963690561]
Causal representation learning seeks to uncover latent, high-level causal representations from low-level observed data. We determine the types of distribution shifts that do contribute to the identifiability of causal representations. We translate our findings into a practical algorithm, allowing for the acquisition of reliable latent causal representations.
arXiv Detail & Related papers (2024-03-23T04:13:55Z)
Understanding Disparities in Post Hoc Machine Learning Explanation [2.965442487094603]
Previous work has highlighted that existing post-hoc explanation methods exhibit disparities in explanation fidelity (across 'race' and 'gender' as sensitive attributes) We specifically assess challenges to explanation disparities that originate from properties of the data. Results indicate that disparities in model explanations can also depend on data and model properties.
arXiv Detail & Related papers (2024-01-25T22:09:28Z)
Fairness Explainability using Optimal Transport with Applications in Image Classification [0.46040036610482665]
We propose a comprehensive approach to uncover the causes of discrimination in Machine Learning applications. We leverage Wasserstein barycenters to achieve fair predictions and introduce an extension to pinpoint bias-associated regions. This allows us to derive a cohesive system which uses the enforced fairness to measure each features influence emphon the bias.
arXiv Detail & Related papers (2023-08-22T00:10:23Z)
Disentangled Representation with Causal Constraints for Counterfactual Fairness [25.114619307838602]
This work theoretically demonstrates that using the structured representations enable downstream predictive models to achieve counterfactual fairness. We propose the Counterfactual Fairness Variational AutoEncoder (CF-VAE) to obtain structured representations with respect to domain knowledge. The experimental results show that the proposed method achieves better fairness and accuracy performance than the benchmark fairness methods.
arXiv Detail & Related papers (2022-08-19T04:47:58Z)
Conditional Supervised Contrastive Learning for Fair Text Classification [59.813422435604025]
We study learning fair representations that satisfy a notion of fairness known as equalized odds for text classification via contrastive learning. Specifically, we first theoretically analyze the connections between learning representations with a fairness constraint and conditional supervised contrastive objectives.
arXiv Detail & Related papers (2022-05-23T17:38:30Z)
Fairness via Explanation Quality: Evaluating Disparities in the Quality of Post hoc Explanations [19.125887321893522]
We propose a novel evaluation framework which can quantitatively measure disparities in the quality of explanations output by state-of-the-art methods. Our results indicate that such disparities are more likely to occur when the models being explained are complex and highly non-linear. This work is the first to highlight and study the problem of group-based disparities in explanation quality.
arXiv Detail & Related papers (2022-05-15T13:01:20Z)
On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification. We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned. Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z)
Discriminative Attribution from Counterfactuals [64.94009515033984]
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations. We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner.
arXiv Detail & Related papers (2021-09-28T00:53:34Z)
Measuring Fairness Under Unawareness of Sensitive Attributes: A Quantification-Based Approach [131.20444904674494]
We tackle the problem of measuring group fairness under unawareness of sensitive attributes. We show that quantification approaches are particularly suited to tackle the fairness-under-unawareness problem.
arXiv Detail & Related papers (2021-09-17T13:45:46Z)
On Disentangled Representations Learned From Correlated Data [59.41587388303554]
We bridge the gap to real-world scenarios by analyzing the behavior of the most prominent disentanglement approaches on correlated data. We show that systematically induced correlations in the dataset are being learned and reflected in the latent representations. We also demonstrate how to resolve these latent correlations, either using weak supervision during training or by post-hoc correcting a pre-trained model with a small number of labels.
arXiv Detail & Related papers (2020-06-14T12:47:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.