Exploring the Plausibility of Hate and Counter Speech Detectors with Explainable AI
- URL: http://arxiv.org/abs/2407.20274v1
- Date: Thu, 25 Jul 2024 10:17:04 GMT
- Title: Exploring the Plausibility of Hate and Counter Speech Detectors with Explainable AI
- Authors: Adrian Jaques Böck, Djordje Slijepčević, Matthias Zeppelzauer,
- Abstract summary: We compare four different explainability approaches, i.e. gradient-based, perturbation-based, attention-based, and prototype-based approaches.
Results show that perturbation-based explainability performs best, followed by gradient-based and attention-based explainability.
- Score: 0.7317046947172644
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper we investigate the explainability of transformer models and their plausibility for hate speech and counter speech detection. We compare representatives of four different explainability approaches, i.e., gradient-based, perturbation-based, attention-based, and prototype-based approaches, and analyze them quantitatively with an ablation study and qualitatively in a user study. Results show that perturbation-based explainability performs best, followed by gradient-based and attention-based explainability. Prototypebased experiments did not yield useful results. Overall, we observe that explainability strongly supports the users in better understanding the model predictions.
Related papers
- CNN-based explanation ensembling for dataset, representation and explanations evaluation [1.1060425537315088]
We explore the potential of ensembling explanations generated by deep classification models using convolutional model.
Through experimentation and analysis, we aim to investigate the implications of combining explanations to uncover a more coherent and reliable patterns of the model's behavior.
arXiv Detail & Related papers (2024-04-16T08:39:29Z) - Predictability and Comprehensibility in Post-Hoc XAI Methods: A
User-Centered Analysis [6.606409729669314]
Post-hoc explainability methods aim to clarify predictions of black-box machine learning models.
We conduct a user study to evaluate comprehensibility and predictability in two widely used tools: LIME and SHAP.
We find that the comprehensibility of SHAP is significantly reduced when explanations are provided for samples near a model's decision boundary.
arXiv Detail & Related papers (2023-09-21T11:54:20Z) - Explaining Explainability: Towards Deeper Actionable Insights into Deep
Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level.
We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z) - Counterfactuals of Counterfactuals: a back-translation-inspired approach
to analyse counterfactual editors [3.4253416336476246]
We focus on the analysis of counterfactual, contrastive explanations.
We propose a new back translation-inspired evaluation methodology.
We show that by iteratively feeding the counterfactual to the explainer we can obtain valuable insights into the behaviour of both the predictor and the explainer models.
arXiv Detail & Related papers (2023-05-26T16:04:28Z) - Identifying Weight-Variant Latent Causal Models [82.14087963690561]
We find that transitivity acts as a key role in impeding the identifiability of latent causal representations.
Under some mild assumptions, we can show that the latent causal representations can be identified up to trivial permutation and scaling.
We propose a novel method, termed Structural caUsAl Variational autoEncoder, which directly learns latent causal representations and causal relationships among them.
arXiv Detail & Related papers (2022-08-30T11:12:59Z) - A Song of (Dis)agreement: Evaluating the Evaluation of Explainable
Artificial Intelligence in Natural Language Processing [7.527234046228323]
We argue that the community should stop using rank correlation as an evaluation metric for attention-based explanations.
We find that attention-based explanations do not correlate strongly with any recent feature attribution methods.
arXiv Detail & Related papers (2022-05-09T21:07:39Z) - Example-based Explanations with Adversarial Attacks for Respiratory
Sound Analysis [15.983890739091159]
We develop a unified example-based explanation method for selecting both representative data (prototypes) and outliers (criticisms)
In particular, we propose a novel application of adversarial attacks to generate an explanation spectrum of data instances via an iterative fast gradient sign method.
arXiv Detail & Related papers (2022-03-30T08:28:48Z) - Explainability in Process Outcome Prediction: Guidelines to Obtain
Interpretable and Faithful Models [77.34726150561087]
We define explainability through the interpretability of the explanations and the faithfulness of the explainability model in the field of process outcome prediction.
This paper contributes a set of guidelines named X-MOP which allows selecting the appropriate model based on the event log specifications.
arXiv Detail & Related papers (2022-03-30T05:59:50Z) - Discriminative Attribution from Counterfactuals [64.94009515033984]
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations.
We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner.
arXiv Detail & Related papers (2021-09-28T00:53:34Z) - Toward Scalable and Unified Example-based Explanation and Outlier
Detection [128.23117182137418]
We argue for a broader adoption of prototype-based student networks capable of providing an example-based explanation for their prediction.
We show that our prototype-based networks beyond similarity kernels deliver meaningful explanations and promising outlier detection results without compromising classification accuracy.
arXiv Detail & Related papers (2020-11-11T05:58:17Z) - Evaluations and Methods for Explanation through Robustness Analysis [117.7235152610957]
We establish a novel set of evaluation criteria for such feature based explanations by analysis.
We obtain new explanations that are loosely necessary and sufficient for a prediction.
We extend the explanation to extract the set of features that would move the current prediction to a target class.
arXiv Detail & Related papers (2020-05-31T05:52:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.