Necessity and Sufficiency for Explaining Text Classifiers: A Case Study
in Hate Speech Detection
- URL: http://arxiv.org/abs/2205.03302v1
- Date: Fri, 6 May 2022 15:34:48 GMT
- Title: Necessity and Sufficiency for Explaining Text Classifiers: A Case Study
in Hate Speech Detection
- Authors: Esma Balkir, Isar Nejadgholi, Kathleen C. Fraser, and Svetlana
Kiritchenko
- Abstract summary: We present a novel feature attribution method for explaining text classifiers, and analyze it in the context of hate speech detection.
We provide two complementary and theoretically-grounded scores -- necessity and sufficiency -- resulting in more informative explanations.
We employ our method to explain the predictions of different hate speech detection models on the same set of curated examples from a test suite, and show that different values of necessity and sufficiency for identity terms correspond to different kinds of false positive errors.
- Score: 7.022948483613112
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel feature attribution method for explaining text
classifiers, and analyze it in the context of hate speech detection. Although
feature attribution models usually provide a single importance score for each
token, we instead provide two complementary and theoretically-grounded scores
-- necessity and sufficiency -- resulting in more informative explanations. We
propose a transparent method that calculates these values by generating
explicit perturbations of the input text, allowing the importance scores
themselves to be explainable. We employ our method to explain the predictions
of different hate speech detection models on the same set of curated examples
from a test suite, and show that different values of necessity and sufficiency
for identity terms correspond to different kinds of false positive errors,
exposing sources of classifier bias against marginalized groups.
Related papers
- Detecting Statements in Text: A Domain-Agnostic Few-Shot Solution [1.3654846342364308]
State-of-the-art approaches usually involve fine-tuning models on large annotated datasets, which are costly to produce.
We propose and release a qualitative and versatile few-shot learning methodology as a common paradigm for any claim-based textual classification task.
We illustrate this methodology in the context of three tasks: climate change contrarianism detection, topic/stance classification and depression-relates symptoms detection.
arXiv Detail & Related papers (2024-05-09T12:03:38Z) - Understanding and Mitigating Classification Errors Through Interpretable
Token Patterns [58.91023283103762]
Characterizing errors in easily interpretable terms gives insight into whether a classifier is prone to making systematic errors.
We propose to discover those patterns of tokens that distinguish correct and erroneous predictions.
We show that our method, Premise, performs well in practice.
arXiv Detail & Related papers (2023-11-18T00:24:26Z) - Interpretable Automatic Fine-grained Inconsistency Detection in Text
Summarization [56.94741578760294]
We propose the task of fine-grained inconsistency detection, the goal of which is to predict the fine-grained types of factual errors in a summary.
Motivated by how humans inspect factual inconsistency in summaries, we propose an interpretable fine-grained inconsistency detection model, FineGrainFact.
arXiv Detail & Related papers (2023-05-23T22:11:47Z) - Towards Procedural Fairness: Uncovering Biases in How a Toxic Language
Classifier Uses Sentiment Information [7.022948483613112]
This work is a step towards evaluating procedural fairness, where unfair processes lead to unfair outcomes.
The produced knowledge can guide debiasing techniques to ensure that important concepts besides identity terms are well-represented in training datasets.
arXiv Detail & Related papers (2022-10-19T16:03:25Z) - Knowledge-based Document Classification with Shannon Entropy [0.0]
We propose a novel knowledge-based model equipped with Shannon Entropy, which measures the richness of information and favors uniform and diverse keyword matches.
We show that the Shannon Entropy significantly improves the recall at fixed level of false positive rate.
arXiv Detail & Related papers (2022-06-06T05:39:10Z) - Understanding Contrastive Learning Requires Incorporating Inductive
Biases [64.56006519908213]
Recent attempts to theoretically explain the success of contrastive learning on downstream tasks prove guarantees depending on properties of em augmentations and the value of em contrastive loss of representations.
We demonstrate that such analyses ignore em inductive biases of the function class and training algorithm, even em provably leading to vacuous guarantees in some settings.
arXiv Detail & Related papers (2022-02-28T18:59:20Z) - Resolving label uncertainty with implicit posterior models [71.62113762278963]
We propose a method for jointly inferring labels across a collection of data samples.
By implicitly assuming the existence of a generative model for which a differentiable predictor is the posterior, we derive a training objective that allows learning under weak beliefs.
arXiv Detail & Related papers (2022-02-28T18:09:44Z) - SEPP: Similarity Estimation of Predicted Probabilities for Defending and
Detecting Adversarial Text [0.0]
We propose an ensemble model based on similarity estimation of predicted probabilities (SEPP) to exploit the large gaps in the misclassified predictions.
We demonstrate the resilience of SEPP in defending and detecting adversarial texts through different types of victim classifiers.
arXiv Detail & Related papers (2021-10-12T05:36:54Z) - Visualizing Classifier Adjacency Relations: A Case Study in Speaker
Verification and Voice Anti-Spoofing [72.4445825335561]
We propose a simple method to derive 2D representation from detection scores produced by an arbitrary set of binary classifiers.
Based upon rank correlations, our method facilitates a visual comparison of classifiers with arbitrary scores.
While the approach is fully versatile and can be applied to any detection task, we demonstrate the method using scores produced by automatic speaker verification and voice anti-spoofing systems.
arXiv Detail & Related papers (2021-06-11T13:03:33Z) - Toward Scalable and Unified Example-based Explanation and Outlier
Detection [128.23117182137418]
We argue for a broader adoption of prototype-based student networks capable of providing an example-based explanation for their prediction.
We show that our prototype-based networks beyond similarity kernels deliver meaningful explanations and promising outlier detection results without compromising classification accuracy.
arXiv Detail & Related papers (2020-11-11T05:58:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.