Fairness via Explanation Quality: Evaluating Disparities in the Quality
of Post hoc Explanations
- URL: http://arxiv.org/abs/2205.07277v1
- Date: Sun, 15 May 2022 13:01:20 GMT
- Title: Fairness via Explanation Quality: Evaluating Disparities in the Quality
of Post hoc Explanations
- Authors: Jessica Dai, Sohini Upadhyay, Ulrich Aivodji, Stephen H. Bach,
Himabindu Lakkaraju
- Abstract summary: We propose a novel evaluation framework which can quantitatively measure disparities in the quality of explanations output by state-of-the-art methods.
Our results indicate that such disparities are more likely to occur when the models being explained are complex and highly non-linear.
This work is the first to highlight and study the problem of group-based disparities in explanation quality.
- Score: 19.125887321893522
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As post hoc explanation methods are increasingly being leveraged to explain
complex models in high-stakes settings, it becomes critical to ensure that the
quality of the resulting explanations is consistently high across various
population subgroups including the minority groups. For instance, it should not
be the case that explanations associated with instances belonging to a
particular gender subgroup (e.g., female) are less accurate than those
associated with other genders. However, there is little to no research that
assesses if there exist such group-based disparities in the quality of the
explanations output by state-of-the-art explanation methods. In this work, we
address the aforementioned gaps by initiating the study of identifying
group-based disparities in explanation quality. To this end, we first outline
the key properties which constitute explanation quality and where disparities
can be particularly problematic. We then leverage these properties to propose a
novel evaluation framework which can quantitatively measure disparities in the
quality of explanations output by state-of-the-art methods. Using this
framework, we carry out a rigorous empirical analysis to understand if and when
group-based disparities in explanation quality arise. Our results indicate that
such disparities are more likely to occur when the models being explained are
complex and highly non-linear. In addition, we also observe that certain post
hoc explanation methods (e.g., Integrated Gradients, SHAP) are more likely to
exhibit the aforementioned disparities. To the best of our knowledge, this work
is the first to highlight and study the problem of group-based disparities in
explanation quality. In doing so, our work sheds light on previously unexplored
ways in which explanation methods may introduce unfairness in real world
decision making.
Related papers
- Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods [11.754326620700283]
We show that post-hoc feature attribution methods exhibit significant gender disparity with respect to their faithfulness, robustness, and complexity.<n>Our results highlight the importance of addressing disparities in explanations when developing and applying explainability methods.
arXiv Detail & Related papers (2025-05-02T11:41:25Z) - How to Probe: Simple Yet Effective Techniques for Improving Post-hoc Explanations [69.72654127617058]
Post-hoc importance attribution methods are a popular tool for "explaining" Deep Neural Networks (DNNs)
In this work we bring forward empirical evidence that challenges this very notion.
We discover a strong dependency on and demonstrate that the training details of a pre-trained model's classification layer play a crucial role.
arXiv Detail & Related papers (2025-03-01T22:25:11Z) - Evaluate with the Inverse: Efficient Approximation of Latent Explanation Quality Distribution [3.0658381192498907]
XAI practitioners rely on measures to gauge the quality of such explanations.
Traditionally, the quality of an explanation has been assessed by comparing it to a randomly generated counterpart.
This paper introduces an alternative: the Quality Gap Estimate (QGE)
arXiv Detail & Related papers (2025-02-21T12:04:01Z) - Evaluating Human Alignment and Model Faithfulness of LLM Rationale [66.75309523854476]
We study how well large language models (LLMs) explain their generations through rationales.
We show that prompting-based methods are less "faithful" than attribution-based explanations.
arXiv Detail & Related papers (2024-06-28T20:06:30Z) - Explaining Groups of Instances Counterfactually for XAI: A Use Case,
Algorithm and User Study for Group-Counterfactuals [7.22614468437919]
We explore a novel use case in which groups of similar instances are explained in a collective fashion.
Group counterfactuals meet a human preference for coherent, broad explanations covering multiple events/instances.
Results show that group counterfactuals elicit modest but definite improvements in people's understanding of an AI system.
arXiv Detail & Related papers (2023-03-16T13:16:50Z) - REVEL Framework to measure Local Linear Explanations for black-box
models: Deep Learning Image Classification case of study [12.49538398746092]
We propose a procedure called REVEL to evaluate different aspects concerning the quality of explanations with a theoretically coherent development.
The experiments have been carried out on image four datasets as benchmark where we show REVEL's descriptive and analytical power.
arXiv Detail & Related papers (2022-11-11T12:15:36Z) - How (Not) To Evaluate Explanation Quality [29.40729766120284]
We formulate desired characteristics of explanation quality that apply across tasks and domains.
We propose actionable guidelines to overcome obstacles that limit today's evaluation of explanation quality.
arXiv Detail & Related papers (2022-10-13T16:06:59Z) - A Meta Survey of Quality Evaluation Criteria in Explanation Methods [0.5801044612920815]
Explanation methods and their evaluation have become a significant issue in explainable artificial intelligence (XAI)
Since the most accurate AI models are opaque with low transparency and comprehensibility, explanations are essential for bias detection and control of uncertainty.
There are a plethora of criteria to choose from when evaluating explanation method quality.
arXiv Detail & Related papers (2022-03-25T22:24:21Z) - Human Interpretation of Saliency-based Explanation Over Text [65.29015910991261]
We study saliency-based explanations over textual data.
We find that people often mis-interpret the explanations.
We propose a method to adjust saliencies based on model estimates of over- and under-perception.
arXiv Detail & Related papers (2022-01-27T15:20:32Z) - Measuring Fairness Under Unawareness of Sensitive Attributes: A
Quantification-Based Approach [131.20444904674494]
We tackle the problem of measuring group fairness under unawareness of sensitive attributes.
We show that quantification approaches are particularly suited to tackle the fairness-under-unawareness problem.
arXiv Detail & Related papers (2021-09-17T13:45:46Z) - Explaining Algorithmic Fairness Through Fairness-Aware Causal Path
Decomposition [37.823248189626014]
We propose to study the problem of identification of the source of model disparities.
Unlike existing interpretation methods which typically learn feature importance, we consider the causal relationships among feature variables.
Our framework is also model agnostic and applicable to a variety of quantitative disparity measures.
arXiv Detail & Related papers (2021-08-11T17:23:47Z) - Prompting Contrastive Explanations for Commonsense Reasoning Tasks [74.7346558082693]
Large pretrained language models (PLMs) can achieve near-human performance on commonsense reasoning tasks.
We show how to use these same models to generate human-interpretable evidence.
arXiv Detail & Related papers (2021-06-12T17:06:13Z) - The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal
Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models.
We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z) - SCOUT: Self-aware Discriminant Counterfactual Explanations [78.79534272979305]
The problem of counterfactual visual explanations is considered.
A new family of discriminant explanations is introduced.
The resulting counterfactual explanations are optimization free and thus much faster than previous methods.
arXiv Detail & Related papers (2020-04-16T17:05:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.