Just Say No: Analyzing the Stance of Neural Dialogue Generation in
Offensive Contexts
- URL: http://arxiv.org/abs/2108.11830v1
- Date: Thu, 26 Aug 2021 14:58:05 GMT
- Title: Just Say No: Analyzing the Stance of Neural Dialogue Generation in
Offensive Contexts
- Authors: Ashutosh Baheti, Maarten Sap, Alan Ritter, Mark Riedl
- Abstract summary: We crowd-annotate ToxiChat, a new dataset of 2,000 Reddit threads and model responses labeled with offensive language and stance.
Our analysis reveals that 42% of user responses agree with toxic comments; 3x their agreement with safe comments.
- Score: 26.660268192685763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dialogue models trained on human conversations inadvertently learn to
generate offensive responses. Moreover, models can insult anyone by agreeing
with an offensive context. To understand the dynamics of contextually offensive
language, we study the stance of dialogue model responses in offensive Reddit
conversations. Specifically, we crowd-annotate ToxiChat, a new dataset of 2,000
Reddit threads and model responses labeled with offensive language and stance.
Our analysis reveals that 42% of user responses agree with toxic comments; 3x
their agreement with safe comments (13%). Pre-trained transformer-based
classifiers fine-tuned on our dataset achieve 0.71 F1 for offensive labels and
0.53 Macro-F1 for stance labels. Finally, we analyze some existing controllable
text generation (CTG) methods to mitigate the contextual offensive behavior of
dialogue models. Compared to the baseline, our best CTG model obtains a 19%
reduction in agreement with offensive context and 29% fewer offensive
responses. This highlights the need for future work to characterize and analyze
more forms of inappropriate behavior in dialogue models to help make them
safer. Our code and corpus are available at
https://github.com/abaheti95/ToxiChat .
Related papers
- Consolidating Strategies for Countering Hate Speech Using Persuasive
Dialogues [3.8979646385036175]
We explore controllable strategies for generating counter-arguments to hateful comments in online conversations.
Using automatic and human evaluations, we determine the best combination of features that generate fluent, argumentative, and logically sound arguments.
We share developed computational models for automatically annotating text with such features, and a silver-standard annotated version of an existing hate speech dialog corpora.
arXiv Detail & Related papers (2024-01-15T16:31:18Z) - Learning From Free-Text Human Feedback -- Collect New Datasets Or Extend
Existing Ones? [57.16050211534735]
We investigate the types and frequency of free-text human feedback in commonly used dialog datasets.
Our findings provide new insights into the composition of the datasets examined, including error types, user response types, and the relations between them.
arXiv Detail & Related papers (2023-10-24T12:01:11Z) - Understanding writing style in social media with a supervised
contrastively pre-trained transformer [57.48690310135374]
Online Social Networks serve as fertile ground for harmful behavior, ranging from hate speech to the dissemination of disinformation.
We introduce the Style Transformer for Authorship Representations (STAR), trained on a large corpus derived from public sources of 4.5 x 106 authored texts.
Using a support base of 8 documents of 512 tokens, we can discern authors from sets of up to 1616 authors with at least 80% accuracy.
arXiv Detail & Related papers (2023-10-17T09:01:17Z) - COBRA Frames: Contextual Reasoning about Effects and Harms of Offensive
Statements [30.1056760312051]
We introduce COBRA frames, the first context-aware formalism for explaining the intents, reactions, and harms of offensive or biased statements.
We create COBRACORPUS, a dataset of 33k potentially offensive statements paired with machine-generated contexts.
We find that explanations by context-agnostic models are significantly worse than by context-aware ones.
arXiv Detail & Related papers (2023-06-03T02:47:24Z) - Prompting for a conversation: How to control a dialog model? [9.268682116424518]
Dialog models are trained on a large amount of text, yet their responses need to be limited to a desired scope and style of a dialog agent.
Because the datasets used to achieve the former contain language that is not compatible with the latter, pre-trained dialog models are fine-tuned on smaller curated datasets.
In this paper we investigate if prompting can mitigate the above trade-off.
arXiv Detail & Related papers (2022-09-22T14:59:55Z) - APPDIA: A Discourse-aware Transformer-based Style Transfer Model for
Offensive Social Media Conversations [11.011242089340437]
We release the first publicly-available, parallel corpus of offensive Reddit comments and their style-transferred counterparts annotated by sociolinguists.
We introduce the first discourse-aware style-transfer models that can effectively reduce offensiveness in Reddit text while preserving the meaning of the original text.
arXiv Detail & Related papers (2022-09-17T00:50:24Z) - LaMDA: Language Models for Dialog Applications [75.75051929981933]
LaMDA is a family of Transformer-based neural language models specialized for dialog.
Fine-tuning with annotated data and enabling the model to consult external knowledge sources can lead to significant improvements.
arXiv Detail & Related papers (2022-01-20T15:44:37Z) - COLD: A Benchmark for Chinese Offensive Language Detection [54.60909500459201]
We use COLDataset, a Chinese offensive language dataset with 37k annotated sentences.
We also propose textscCOLDetector to study output offensiveness of popular Chinese language models.
Our resources and analyses are intended to help detoxify the Chinese online communities and evaluate the safety performance of generative language models.
arXiv Detail & Related papers (2022-01-16T11:47:23Z) - Zero-Shot Dialogue Disentanglement by Self-Supervised Entangled Response
Selection [79.37200787463917]
Dialogue disentanglement aims to group utterances in a long and multi-participant dialogue into threads.
This is useful for discourse analysis and downstream applications such as dialogue response selection.
We are the first to propose atextbfzero-shot dialogue disentanglement solution.
arXiv Detail & Related papers (2021-10-25T05:15:01Z) - A Large-Scale Chinese Short-Text Conversation Dataset [77.55813366932313]
We present a large-scale cleaned Chinese conversation dataset, LCCC, which contains a base version (6.8million dialogues) and a large version (12.0 million dialogues)
The quality of our dataset is ensured by a rigorous data cleaning pipeline, which is built based on a set of rules.
We also release pre-training dialogue models which are trained on LCCC-base and LCCC-large respectively.
arXiv Detail & Related papers (2020-08-10T08:12:49Z) - Speaker Sensitive Response Evaluation Model [17.381658875470638]
We propose an automatic evaluation model based on the similarity of the generated response with the conversational context.
We learn the model parameters from an unlabeled conversation corpus.
We show that our model can be applied to movie dialogues without any additional training.
arXiv Detail & Related papers (2020-06-12T08:59:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.