"Nice Try, Kiddo": Investigating Ad Hominems in Dialogue Responses
- URL: http://arxiv.org/abs/2010.12820v2
- Date: Mon, 12 Apr 2021 17:22:39 GMT
- Title: "Nice Try, Kiddo": Investigating Ad Hominems in Dialogue Responses
- Authors: Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, Nanyun Peng
- Abstract summary: Ad hominem attacks are those that target some feature of a person's character instead of the position the person is maintaining.
We propose categories of ad hominems, compose an annotated dataset, and build a system to analyze human and dialogue responses to English Twitter posts.
Our results indicate that 1) responses from both humans and DialoGPT contain more ad hominems for discussions around marginalized communities, 2) different quantities of ad hominems in the training data can influence the likelihood of generating ad hominems, and 3) we can constrained decoding techniques to reduce ad hominems
- Score: 87.89632038677912
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ad hominem attacks are those that target some feature of a person's character
instead of the position the person is maintaining. These attacks are harmful
because they propagate implicit biases and diminish a person's credibility.
Since dialogue systems respond directly to user input, it is important to study
ad hominems in dialogue responses. To this end, we propose categories of ad
hominems, compose an annotated dataset, and build a classifier to analyze human
and dialogue system responses to English Twitter posts. We specifically compare
responses to Twitter topics about marginalized communities (#BlackLivesMatter,
#MeToo) versus other topics (#Vegan, #WFH), because the abusive language of ad
hominems could further amplify the skew of power away from marginalized
populations. Furthermore, we propose a constrained decoding technique that uses
salient $n$-gram similarity as a soft constraint for top-$k$ sampling to reduce
the amount of ad hominems generated. Our results indicate that 1) responses
from both humans and DialoGPT contain more ad hominems for discussions around
marginalized communities, 2) different quantities of ad hominems in the
training data can influence the likelihood of generating ad hominems, and 3) we
can use constrained decoding techniques to reduce ad hominems in generated
dialogue responses.
Related papers
- Analyzing Toxicity in Deep Conversations: A Reddit Case Study [0.0]
This work employs a tree-based approach to understand how users behave concerning toxicity in public conversation settings.
We collect both the posts and the comment sections of the top 100 posts from 8 Reddit communities that allow profanity, totaling over 1 million responses.
We find that toxic comments increase the likelihood of subsequent toxic comments being produced in online conversations.
arXiv Detail & Related papers (2024-04-11T16:10:44Z) - Consolidating Strategies for Countering Hate Speech Using Persuasive
Dialogues [3.8979646385036175]
We explore controllable strategies for generating counter-arguments to hateful comments in online conversations.
Using automatic and human evaluations, we determine the best combination of features that generate fluent, argumentative, and logically sound arguments.
We share developed computational models for automatically annotating text with such features, and a silver-standard annotated version of an existing hate speech dialog corpora.
arXiv Detail & Related papers (2024-01-15T16:31:18Z) - Collective moderation of hate, toxicity, and extremity in online
discussions [1.114199733551736]
We analyze a large corpus of more than 130,000 discussions on Twitter over four years.
We identify different dimensions of discourse that might be related to the probability of hate speech in subsequent tweets.
We find that expressing simple opinions, not necessarily supported by facts, relates to the least hate in subsequent discussions.
arXiv Detail & Related papers (2023-03-01T09:35:26Z) - AutoReply: Detecting Nonsense in Dialogue Introspectively with
Discriminative Replies [71.62832112141913]
We show that dialogue models can detect errors in their own messages introspectively, by calculating the likelihood of replies that are indicative of poor messages.
We first show that hand-crafted replies can be effective for the task of detecting nonsense in applications as complex as Diplomacy.
We find that AutoReply-generated replies outperform handcrafted replies and perform on par with carefully fine-tuned large supervised models.
arXiv Detail & Related papers (2022-11-22T22:31:34Z) - Robots-Dont-Cry: Understanding Falsely Anthropomorphic Utterances in
Dialog Systems [64.10696852552103]
Highly anthropomorphic responses might make users uncomfortable or implicitly deceive them into thinking they are interacting with a human.
We collect human ratings on the feasibility of approximately 900 two-turn dialogs sampled from 9 diverse data sources.
arXiv Detail & Related papers (2022-10-22T12:10:44Z) - "Dummy Grandpa, do you know anything?": Identifying and Characterizing
Ad hominem Fallacy Usage in the Wild [7.022640250985622]
Ad hominem arguments are one of the most effective forms of such fallacies.
Ad hominem argument usage increased significantly since the 2016 US Presidential election.
arXiv Detail & Related papers (2022-09-05T17:16:44Z) - Persua: A Visual Interactive System to Enhance the Persuasiveness of
Arguments in Online Discussion [52.49981085431061]
Enhancing people's ability to write persuasive arguments could contribute to the effectiveness and civility in online communication.
We derived four design goals for a tool that helps users improve the persuasiveness of arguments in online discussions.
Persua is an interactive visual system that provides example-based guidance on persuasive strategies to enhance the persuasiveness of arguments.
arXiv Detail & Related papers (2022-04-16T08:07:53Z) - Just Say No: Analyzing the Stance of Neural Dialogue Generation in
Offensive Contexts [26.660268192685763]
We crowd-annotate ToxiChat, a new dataset of 2,000 Reddit threads and model responses labeled with offensive language and stance.
Our analysis reveals that 42% of user responses agree with toxic comments; 3x their agreement with safe comments.
arXiv Detail & Related papers (2021-08-26T14:58:05Z) - Revealing Persona Biases in Dialogue Systems [64.96908171646808]
We present the first large-scale study on persona biases in dialogue systems.
We conduct analyses on personas of different social classes, sexual orientations, races, and genders.
In our studies of the Blender and DialoGPT dialogue systems, we show that the choice of personas can affect the degree of harms in generated responses.
arXiv Detail & Related papers (2021-04-18T05:44:41Z) - Racism is a Virus: Anti-Asian Hate and Counterspeech in Social Media
during the COVID-19 Crisis [51.39895377836919]
COVID-19 has sparked racism and hate on social media targeted towards Asian communities.
We study the evolution and spread of anti-Asian hate speech through the lens of Twitter.
We create COVID-HATE, the largest dataset of anti-Asian hate and counterspeech spanning 14 months.
arXiv Detail & Related papers (2020-05-25T21:58:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.