Explanations as Bias Detectors: A Critical Study of Local Post-hoc XAI Methods for Fairness Exploration
- URL: http://arxiv.org/abs/2505.00802v1
- Date: Thu, 01 May 2025 19:03:18 GMT
- Title: Explanations as Bias Detectors: A Critical Study of Local Post-hoc XAI Methods for Fairness Exploration
- Authors: Vasiliki Papanikou, Danae Pla Karidi, Evaggelia Pitoura, Emmanouil Panagiotou, Eirini Ntoutsi,
- Abstract summary: This paper explores how explainability methods can be leveraged to detect and interpret unfairness.<n>We propose a pipeline that integrates local post-hoc explanation methods to derive fairness-related insights.
- Score: 5.113545724516812
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: As Artificial Intelligence (AI) is increasingly used in areas that significantly impact human lives, concerns about fairness and transparency have grown, especially regarding their impact on protected groups. Recently, the intersection of explainability and fairness has emerged as an important area to promote responsible AI systems. This paper explores how explainability methods can be leveraged to detect and interpret unfairness. We propose a pipeline that integrates local post-hoc explanation methods to derive fairness-related insights. During the pipeline design, we identify and address critical questions arising from the use of explanations as bias detectors such as the relationship between distributive and procedural fairness, the effect of removing the protected attribute, the consistency and quality of results across different explanation methods, the impact of various aggregation strategies of local explanations on group fairness evaluations, and the overall trustworthiness of explanations as bias detectors. Our results show the potential of explanation methods used for fairness while highlighting the need to carefully consider the aforementioned critical aspects.
Related papers
- Argumentative Debates for Transparent Bias Detection [Technical Report] [18.27485896306961]
We propose a novel interpretable, explainable method for bias detection relying on debates about the presence of bias against individuals.<n>Our method builds upon techniques from formal and computational argumentation, whereby debates result from arguing about biases within and across neighbourhoods.<n>We provide formal, quantitative, and qualitative evaluations of our method, highlighting its strengths as well as its interpretability and explainability.
arXiv Detail & Related papers (2025-08-06T14:56:08Z) - Fair Deepfake Detectors Can Generalize [51.21167546843708]
We show that controlling for confounders (data distribution and model capacity) enables improved generalization via fairness interventions.<n>Motivated by this insight, we propose Demographic Attribute-insensitive Intervention Detection (DAID), a plug-and-play framework composed of: i) Demographic-aware data rebalancing, which employs inverse-propensity weighting and subgroup-wise feature normalization to neutralize distributional biases; and ii) Demographic-agnostic feature aggregation, which uses a novel alignment loss to suppress sensitive-attribute signals.<n>DAID consistently achieves superior performance in both fairness and generalization compared to several state-of-the-art
arXiv Detail & Related papers (2025-07-03T14:10:02Z) - Data Fusion for Partial Identification of Causal Effects [62.56890808004615]
We propose a novel partial identification framework that enables researchers to answer key questions.<n>Is the causal effect positive or negative? and How severe must assumption violations be to overturn this conclusion?<n>We apply our framework to the Project STAR study, which investigates the effect of classroom size on students' third-grade standardized test performance.
arXiv Detail & Related papers (2025-05-30T07:13:01Z) - When Can You Trust Your Explanations? A Robustness Analysis on Feature Importances [42.36530107262305]
robustness of explanations plays a central role in ensuring trust in both the system and the provided explanation.<n>We propose a novel approach to analyse the robustness of neural network explanations to non-adversarial perturbations.<n>We additionally present an ensemble method to aggregate various explanations, showing how merging explanations can be beneficial for both understanding the model's decision and evaluating the robustness.
arXiv Detail & Related papers (2024-06-20T14:17:57Z) - What Hides behind Unfairness? Exploring Dynamics Fairness in Reinforcement Learning [52.51430732904994]
In reinforcement learning problems, agents must consider long-term fairness while maximizing returns.
Recent works have proposed many different types of fairness notions, but how unfairness arises in RL problems remains unclear.
We introduce a novel notion called dynamics fairness, which explicitly captures the inequality stemming from environmental dynamics.
arXiv Detail & Related papers (2024-04-16T22:47:59Z) - Identifiable Latent Neural Causal Models [82.14087963690561]
Causal representation learning seeks to uncover latent, high-level causal representations from low-level observed data.
We determine the types of distribution shifts that do contribute to the identifiability of causal representations.
We translate our findings into a practical algorithm, allowing for the acquisition of reliable latent causal representations.
arXiv Detail & Related papers (2024-03-23T04:13:55Z) - The Impact of Explanations on Fairness in Human-AI Decision-Making: Protected vs Proxy Features [25.752072910748716]
Explanations may help human-AI teams address biases for fairer decision-making.
We study the effect of the presence of protected and proxy features on participants' perception of model fairness.
We find that explanations help people detect direct but not indirect biases.
arXiv Detail & Related papers (2023-10-12T16:00:16Z) - Fairness Explainability using Optimal Transport with Applications in
Image Classification [0.46040036610482665]
We propose a comprehensive approach to uncover the causes of discrimination in Machine Learning applications.
We leverage Wasserstein barycenters to achieve fair predictions and introduce an extension to pinpoint bias-associated regions.
This allows us to derive a cohesive system which uses the enforced fairness to measure each features influence emphon the bias.
arXiv Detail & Related papers (2023-08-22T00:10:23Z) - Fairness and robustness in anti-causal prediction [73.693135253335]
Robustness to distribution shift and fairness have independently emerged as two important desiderata required of machine learning models.
While these two desiderata seem related, the connection between them is often unclear in practice.
By taking this perspective, we draw explicit connections between a common fairness criterion - separation - and a common notion of robustness.
arXiv Detail & Related papers (2022-09-20T02:41:17Z) - Attributing Fair Decisions with Attention Interventions [28.968122909973975]
We design an attention-based model that can be leveraged as an attribution framework.
It can identify features responsible for both performance and fairness of the model through attention interventions and attention weight manipulation.
We then design a post-processing bias mitigation strategy and compare it with a suite of baselines.
arXiv Detail & Related papers (2021-09-08T22:28:44Z) - Learning Bias-Invariant Representation by Cross-Sample Mutual
Information Minimization [77.8735802150511]
We propose a cross-sample adversarial debiasing (CSAD) method to remove the bias information misused by the target task.
The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator.
We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-11T21:17:02Z) - Through the Data Management Lens: Experimental Analysis and Evaluation
of Fair Classification [75.49600684537117]
Data management research is showing an increasing presence and interest in topics related to data and algorithmic fairness.
We contribute a broad analysis of 13 fair classification approaches and additional variants, over their correctness, fairness, efficiency, scalability, and stability.
Our analysis highlights novel insights on the impact of different metrics and high-level approach characteristics on different aspects of performance.
arXiv Detail & Related papers (2021-01-18T22:55:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.