Concept-Based Explanations to Test for False Causal Relationships
Learned by Abusive Language Classifiers
- URL: http://arxiv.org/abs/2307.01900v1
- Date: Tue, 4 Jul 2023 19:57:54 GMT
- Title: Concept-Based Explanations to Test for False Causal Relationships
Learned by Abusive Language Classifiers
- Authors: Isar Nejadgholi, Svetlana Kiritchenko, Kathleen C. Fraser, and Esma
Balk{\i}r
- Abstract summary: We consider three well-known abusive language classifiers trained on large English datasets.
We first examine the unwanted dependencies learned by the classifiers by assessing their accuracy on a challenge set across all decision thresholds.
We then introduce concept-based explanation metrics to assess the influence of the concept on the labels.
- Score: 7.022948483613113
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Classifiers tend to learn a false causal relationship between an
over-represented concept and a label, which can result in over-reliance on the
concept and compromised classification accuracy. It is imperative to have
methods in place that can compare different models and identify over-reliances
on specific concepts. We consider three well-known abusive language classifiers
trained on large English datasets and focus on the concept of negative
emotions, which is an important signal but should not be learned as a
sufficient feature for the label of abuse. Motivated by the definition of
global sufficiency, we first examine the unwanted dependencies learned by the
classifiers by assessing their accuracy on a challenge set across all decision
thresholds. Further, recognizing that a challenge set might not always be
available, we introduce concept-based explanation metrics to assess the
influence of the concept on the labels. These explanations allow us to compare
classifiers regarding the degree of false global sufficiency they have learned
between a concept and a label.
Related papers
- Robust Representation Learning for Unreliable Partial Label Learning [86.909511808373]
Partial Label Learning (PLL) is a type of weakly supervised learning where each training instance is assigned a set of candidate labels, but only one label is the ground-truth.
This is known as Unreliable Partial Label Learning (UPLL) that introduces an additional complexity due to the inherent unreliability and ambiguity of partial labels.
We propose the Unreliability-Robust Representation Learning framework (URRL) that leverages unreliability-robust contrastive learning to help the model fortify against unreliable partial labels effectively.
arXiv Detail & Related papers (2023-08-31T13:37:28Z) - LEACE: Perfect linear concept erasure in closed form [103.61624393221447]
Concept erasure aims to remove specified features from a representation.
We introduce LEAst-squares Concept Erasure (LEACE), a closed-form method which provably prevents all linear classifiers from detecting a concept while changing the representation as little as possible.
We apply LEACE to large language models with a novel procedure called "concept scrubbing," which erases target concept information from every layer in the network.
arXiv Detail & Related papers (2023-06-06T16:07:24Z) - Towards Procedural Fairness: Uncovering Biases in How a Toxic Language
Classifier Uses Sentiment Information [7.022948483613112]
This work is a step towards evaluating procedural fairness, where unfair processes lead to unfair outcomes.
The produced knowledge can guide debiasing techniques to ensure that important concepts besides identity terms are well-represented in training datasets.
arXiv Detail & Related papers (2022-10-19T16:03:25Z) - Noise Audits Improve Moral Foundation Classification [5.7685650619372595]
Morality plays an important role in culture, identity, and emotion.
Recent advances in natural language processing have shown that it is possible to classify moral values expressed in text at scale.
Morality classification relies on human annotators to label the moral expressions in text.
arXiv Detail & Related papers (2022-10-13T23:37:47Z) - Probing Classifiers are Unreliable for Concept Removal and Detection [18.25734277357466]
Neural network models trained on text data have been found to encode undesirable linguistic or sensitive concepts in their representation.
Recent work has proposed post-hoc and adversarial methods to remove such unwanted concepts from a model's representation.
We show that these methods can be counter-productive, and in the worst case may end up destroying all task-relevant features.
arXiv Detail & Related papers (2022-07-08T23:15:26Z) - Resolving label uncertainty with implicit posterior models [71.62113762278963]
We propose a method for jointly inferring labels across a collection of data samples.
By implicitly assuming the existence of a generative model for which a differentiable predictor is the posterior, we derive a training objective that allows learning under weak beliefs.
arXiv Detail & Related papers (2022-02-28T18:09:44Z) - Translational Concept Embedding for Generalized Compositional Zero-shot
Learning [73.60639796305415]
Generalized compositional zero-shot learning means to learn composed concepts of attribute-object pairs in a zero-shot fashion.
This paper introduces a new approach, termed translational concept embedding, to solve these two difficulties in a unified framework.
arXiv Detail & Related papers (2021-12-20T21:27:51Z) - Active Refinement for Multi-Label Learning: A Pseudo-Label Approach [84.52793080276048]
Multi-label learning (MLL) aims to associate a given instance with its relevant labels from a set of concepts.
Previous works of MLL mainly focused on the setting where the concept set is assumed to be fixed.
Many real-world applications require introducing new concepts into the set to meet new demands.
arXiv Detail & Related papers (2021-09-29T19:17:05Z) - DISSECT: Disentangled Simultaneous Explanations via Concept Traversals [33.65478845353047]
DISSECT is a novel approach to explaining deep learning model inferences.
By training a generative model from a classifier's signal, DISSECT offers a way to discover a classifier's inherent "notion" of distinct concepts.
We show that DISSECT produces CTs that disentangle several concepts and are coupled to its reasoning due to joint training.
arXiv Detail & Related papers (2021-05-31T17:11:56Z) - CURI: A Benchmark for Productive Concept Learning Under Uncertainty [33.83721664338612]
We introduce a new few-shot, meta-learning benchmark, Compositional Reasoning Under Uncertainty (CURI)
CURI evaluates different aspects of productive and systematic generalization, including abstract understandings of disentangling, productive generalization, learning operations, variable binding, etc.
It also defines a model-independent "compositionality gap" to evaluate the difficulty of generalizing out-of-distribution along each of these axes.
arXiv Detail & Related papers (2020-10-06T16:23:17Z) - Debiased Contrastive Learning [64.98602526764599]
We develop a debiased contrastive objective that corrects for the sampling of same-label datapoints.
Empirically, the proposed objective consistently outperforms the state-of-the-art for representation learning in vision, language, and reinforcement learning benchmarks.
arXiv Detail & Related papers (2020-07-01T04:25:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.