Detecting Inappropriate Messages on Sensitive Topics that Could Harm a
Company's Reputation
- URL: http://arxiv.org/abs/2103.05345v1
- Date: Tue, 9 Mar 2021 10:50:30 GMT
- Title: Detecting Inappropriate Messages on Sensitive Topics that Could Harm a
Company's Reputation
- Authors: Nikolay Babakov, Varvara Logacheva, Olga Kozlova, Nikita Semenov and
Alexander Panchenko
- Abstract summary: A calm discussion of turtles or fishing less often fuels inappropriate toxic dialogues than a discussion of politics or sexual minorities.
We define a set of sensitive topics that can yield inappropriate and toxic messages and describe the methodology of collecting and labeling a dataset for appropriateness.
- Score: 64.22895450493729
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Not all topics are equally "flammable" in terms of toxicity: a calm
discussion of turtles or fishing less often fuels inappropriate toxic dialogues
than a discussion of politics or sexual minorities. We define a set of
sensitive topics that can yield inappropriate and toxic messages and describe
the methodology of collecting and labeling a dataset for appropriateness. While
toxicity in user-generated data is well-studied, we aim at defining a more
fine-grained notion of inappropriateness. The core of inappropriateness is that
it can harm the reputation of a speaker. This is different from toxicity in two
respects: (i) inappropriateness is topic-related, and (ii) inappropriate
message is not toxic but still unacceptable. We collect and release two
datasets for Russian: a topic-labeled dataset and an appropriateness-labeled
dataset. We also release pre-trained classification models trained on this
data.
Related papers
- Constructing Highly Inductive Contexts for Dialogue Safety through
Controllable Reverse Generation [65.48908724440047]
We propose a method called emphreverse generation to construct adversarial contexts conditioned on a given response.
We test three popular pretrained dialogue models (Blender, DialoGPT, and Plato2) and find that BAD+ can largely expose their safety problems.
arXiv Detail & Related papers (2022-12-04T12:23:41Z) - Handling and Presenting Harmful Text [10.359716317114815]
Textual data can pose a risk of serious harm.
These harms can be categorised along three axes: misinformation, hate speech or racial stereotypes.
It is an unsolved problem in NLP as to how textual harms should be handled, presented, and discussed.
We provide practical advice and introduce textscHarmCheck, a resource for reflecting on research into textual harms.
arXiv Detail & Related papers (2022-04-29T17:34:12Z) - Beyond Plain Toxic: Detection of Inappropriate Statements on Flammable
Topics for the Russian Language [76.58220021791955]
We present two text collections labelled according to binary notion of inapropriateness and a multinomial notion of sensitive topic.
To objectivise the notion of inappropriateness, we define it in a data-driven way though crowdsourcing.
arXiv Detail & Related papers (2022-03-04T15:59:06Z) - Toxicity Detection can be Sensitive to the Conversational Context [64.28043776806213]
We construct and publicly release a dataset of 10,000 posts with two kinds of toxicity labels.
We introduce a new task, context sensitivity estimation, which aims to identify posts whose perceived toxicity changes if the context is also considered.
arXiv Detail & Related papers (2021-11-19T13:57:26Z) - Mitigating Biases in Toxic Language Detection through Invariant
Rationalization [70.36701068616367]
biases toward some attributes, including gender, race, and dialect, exist in most training datasets for toxicity detection.
We propose to use invariant rationalization (InvRat), a game-theoretic framework consisting of a rationale generator and a predictor, to rule out the spurious correlation of certain syntactic patterns.
Our method yields lower false positive rate in both lexical and dialectal attributes than previous debiasing methods.
arXiv Detail & Related papers (2021-06-14T08:49:52Z) - Challenges in Automated Debiasing for Toxic Language Detection [81.04406231100323]
Biased associations have been a challenge in the development of classifiers for detecting toxic language.
We investigate recently introduced debiasing methods for text classification datasets and models, as applied to toxic language detection.
Our focus is on lexical (e.g., swear words, slurs, identity mentions) and dialectal markers (specifically African American English)
arXiv Detail & Related papers (2021-01-29T22:03:17Z) - Toxicity Detection: Does Context Really Matter? [22.083682201142242]
We find that context can amplify or mitigate the perceived toxicity of posts.
Surprisingly, we also find no evidence that context actually improves the performance of toxicity classifiers.
This points to the need for larger datasets of comments annotated in context.
arXiv Detail & Related papers (2020-06-01T15:03:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.