Toxicity Detection can be Sensitive to the Conversational Context
- URL: http://arxiv.org/abs/2111.10223v1
- Date: Fri, 19 Nov 2021 13:57:26 GMT
- Title: Toxicity Detection can be Sensitive to the Conversational Context
- Authors: Alexandros Xenos, John Pavlopoulos, Ion Androutsopoulos, Lucas Dixon,
Jeffrey Sorensen and Leo Laugier
- Abstract summary: We construct and publicly release a dataset of 10,000 posts with two kinds of toxicity labels.
We introduce a new task, context sensitivity estimation, which aims to identify posts whose perceived toxicity changes if the context is also considered.
- Score: 64.28043776806213
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: User posts whose perceived toxicity depends on the conversational context are
rare in current toxicity detection datasets. Hence, toxicity detectors trained
on existing datasets will also tend to disregard context, making the detection
of context-sensitive toxicity harder when it does occur. We construct and
publicly release a dataset of 10,000 posts with two kinds of toxicity labels:
(i) annotators considered each post with the previous one as context; and (ii)
annotators had no additional context. Based on this, we introduce a new task,
context sensitivity estimation, which aims to identify posts whose perceived
toxicity changes if the context (previous post) is also considered. We then
evaluate machine learning systems on this task, showing that classifiers of
practical quality can be developed, and we show that data augmentation with
knowledge distillation can improve the performance further. Such systems could
be used to enhance toxicity detection datasets with more context-dependent
posts, or to suggest when moderators should consider the parent posts, which
often may be unnecessary and may otherwise introduce significant additional
cost.
Related papers
- Towards Building a Robust Toxicity Predictor [13.162016701556725]
This paper presents a novel adversarial attack, texttToxicTrap, introducing small word-level perturbations to fool SOTA text classifiers to predict toxic text samples as benign.
Two novel goal function designs allow ToxicTrap to identify weaknesses in both multiclass and multilabel toxic language detectors.
arXiv Detail & Related papers (2024-04-09T22:56:05Z) - Toxicity Inspector: A Framework to Evaluate Ground Truth in Toxicity
Detection Through Feedback [0.0]
This paper introduces a toxicity inspector framework that incorporates a human-in-the-loop pipeline.
It aims to enhance the reliability of toxicity benchmark datasets by centering the evaluator's values through an iterative feedback cycle.
arXiv Detail & Related papers (2023-05-11T11:56:42Z) - Constructing Highly Inductive Contexts for Dialogue Safety through
Controllable Reverse Generation [65.48908724440047]
We propose a method called emphreverse generation to construct adversarial contexts conditioned on a given response.
We test three popular pretrained dialogue models (Blender, DialoGPT, and Plato2) and find that BAD+ can largely expose their safety problems.
arXiv Detail & Related papers (2022-12-04T12:23:41Z) - Toxicity Detection with Generative Prompt-based Inference [3.9741109244650823]
It is a long-known risk that language models (LMs), once trained on corpus containing undesirable content, have the power to manifest biases and toxicity.
In this work, we explore the generative variant of zero-shot prompt-based toxicity detection with comprehensive trials on prompt engineering.
arXiv Detail & Related papers (2022-05-24T22:44:43Z) - Revisiting Contextual Toxicity Detection in Conversations [28.465019968374413]
We show that toxicity labelling by humans is in general influenced by the conversational structure, polarity and topic of the context.
We propose to bring these findings into computational detection models by introducing (a) neural architectures for contextual toxicity detection.
We have also demonstrated that such models can benefit from synthetic data, especially in the social media domain.
arXiv Detail & Related papers (2021-11-24T11:50:37Z) - Mitigating Biases in Toxic Language Detection through Invariant
Rationalization [70.36701068616367]
biases toward some attributes, including gender, race, and dialect, exist in most training datasets for toxicity detection.
We propose to use invariant rationalization (InvRat), a game-theoretic framework consisting of a rationale generator and a predictor, to rule out the spurious correlation of certain syntactic patterns.
Our method yields lower false positive rate in both lexical and dialectal attributes than previous debiasing methods.
arXiv Detail & Related papers (2021-06-14T08:49:52Z) - Detecting Inappropriate Messages on Sensitive Topics that Could Harm a
Company's Reputation [64.22895450493729]
A calm discussion of turtles or fishing less often fuels inappropriate toxic dialogues than a discussion of politics or sexual minorities.
We define a set of sensitive topics that can yield inappropriate and toxic messages and describe the methodology of collecting and labeling a dataset for appropriateness.
arXiv Detail & Related papers (2021-03-09T10:50:30Z) - Challenges in Automated Debiasing for Toxic Language Detection [81.04406231100323]
Biased associations have been a challenge in the development of classifiers for detecting toxic language.
We investigate recently introduced debiasing methods for text classification datasets and models, as applied to toxic language detection.
Our focus is on lexical (e.g., swear words, slurs, identity mentions) and dialectal markers (specifically African American English)
arXiv Detail & Related papers (2021-01-29T22:03:17Z) - RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language
Models [93.151822563361]
Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment.
We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration.
arXiv Detail & Related papers (2020-09-24T03:17:19Z) - Toxicity Detection: Does Context Really Matter? [22.083682201142242]
We find that context can amplify or mitigate the perceived toxicity of posts.
Surprisingly, we also find no evidence that context actually improves the performance of toxicity classifiers.
This points to the need for larger datasets of comments annotated in context.
arXiv Detail & Related papers (2020-06-01T15:03:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.