Toxicity Detection: Does Context Really Matter?
- URL: http://arxiv.org/abs/2006.00998v1
- Date: Mon, 1 Jun 2020 15:03:48 GMT
- Title: Toxicity Detection: Does Context Really Matter?
- Authors: John Pavlopoulos and Jeffrey Sorensen and Lucas Dixon and Nithum Thain
and Ion Androutsopoulos
- Abstract summary: We find that context can amplify or mitigate the perceived toxicity of posts.
Surprisingly, we also find no evidence that context actually improves the performance of toxicity classifiers.
This points to the need for larger datasets of comments annotated in context.
- Score: 22.083682201142242
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Moderation is crucial to promoting healthy on-line discussions. Although
several `toxicity' detection datasets and models have been published, most of
them ignore the context of the posts, implicitly assuming that comments maybe
judged independently. We investigate this assumption by focusing on two
questions: (a) does context affect the human judgement, and (b) does
conditioning on context improve performance of toxicity detection systems? We
experiment with Wikipedia conversations, limiting the notion of context to the
previous post in the thread and the discussion title. We find that context can
both amplify or mitigate the perceived toxicity of posts. Moreover, a small but
significant subset of manually labeled posts (5% in one of our experiments) end
up having the opposite toxicity labels if the annotators are not provided with
context. Surprisingly, we also find no evidence that context actually improves
the performance of toxicity classifiers, having tried a range of classifiers
and mechanisms to make them context aware. This points to the need for larger
datasets of comments annotated in context. We make our code and data publicly
available.
Related papers
- BiasX: "Thinking Slow" in Toxic Content Moderation with Explanations of
Implied Social Biases [28.519851740902258]
BiasX is a framework that enhances content moderation setups with free-text explanations of statements' implied social biases.
We show that participants substantially benefit from explanations for correctly identifying subtly (non-)toxic content.
Our results showcase the promise of using free-text explanations to encourage more thoughtful toxicity moderation.
arXiv Detail & Related papers (2023-05-23T01:45:18Z) - Constructing Highly Inductive Contexts for Dialogue Safety through
Controllable Reverse Generation [65.48908724440047]
We propose a method called emphreverse generation to construct adversarial contexts conditioned on a given response.
We test three popular pretrained dialogue models (Blender, DialoGPT, and Plato2) and find that BAD+ can largely expose their safety problems.
arXiv Detail & Related papers (2022-12-04T12:23:41Z) - Hate Speech and Counter Speech Detection: Conversational Context Does
Matter [7.333666276087548]
This paper investigates the role of conversational context in the annotation and detection of online hate and counter speech.
We created a context-aware dataset for a 3-way classification task on Reddit comments: hate speech, counter speech, or neutral.
arXiv Detail & Related papers (2022-06-13T19:05:44Z) - Beyond Plain Toxic: Detection of Inappropriate Statements on Flammable
Topics for the Russian Language [76.58220021791955]
We present two text collections labelled according to binary notion of inapropriateness and a multinomial notion of sensitive topic.
To objectivise the notion of inappropriateness, we define it in a data-driven way though crowdsourcing.
arXiv Detail & Related papers (2022-03-04T15:59:06Z) - Toxicity Detection can be Sensitive to the Conversational Context [64.28043776806213]
We construct and publicly release a dataset of 10,000 posts with two kinds of toxicity labels.
We introduce a new task, context sensitivity estimation, which aims to identify posts whose perceived toxicity changes if the context is also considered.
arXiv Detail & Related papers (2021-11-19T13:57:26Z) - Detecting Inappropriate Messages on Sensitive Topics that Could Harm a
Company's Reputation [64.22895450493729]
A calm discussion of turtles or fishing less often fuels inappropriate toxic dialogues than a discussion of politics or sexual minorities.
We define a set of sensitive topics that can yield inappropriate and toxic messages and describe the methodology of collecting and labeling a dataset for appropriateness.
arXiv Detail & Related papers (2021-03-09T10:50:30Z) - Challenges in Automated Debiasing for Toxic Language Detection [81.04406231100323]
Biased associations have been a challenge in the development of classifiers for detecting toxic language.
We investigate recently introduced debiasing methods for text classification datasets and models, as applied to toxic language detection.
Our focus is on lexical (e.g., swear words, slurs, identity mentions) and dialectal markers (specifically African American English)
arXiv Detail & Related papers (2021-01-29T22:03:17Z) - Reading Between the Demographic Lines: Resolving Sources of Bias in
Toxicity Classifiers [0.0]
Perspective API is perhaps the most widely used toxicity classifier in industry.
Google's model tends to unfairly assign higher toxicity scores to comments containing words referring to the identities of commonly targeted groups.
We have constructed several toxicity classifiers with the intention of reducing unintended bias while maintaining strong classification performance.
arXiv Detail & Related papers (2020-06-29T21:40:55Z) - Don't Judge an Object by Its Context: Learning to Overcome Contextual
Bias [113.44471186752018]
Existing models often leverage co-occurrences between objects and their context to improve recognition accuracy.
This work focuses on addressing such contextual biases to improve the robustness of the learnt feature representations.
arXiv Detail & Related papers (2020-01-09T18:31:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.