Consolidating Strategies for Countering Hate Speech Using Persuasive
Dialogues
- URL: http://arxiv.org/abs/2401.07810v1
- Date: Mon, 15 Jan 2024 16:31:18 GMT
- Title: Consolidating Strategies for Countering Hate Speech Using Persuasive
Dialogues
- Authors: Sougata Saha and Rohini Srihari
- Abstract summary: We explore controllable strategies for generating counter-arguments to hateful comments in online conversations.
Using automatic and human evaluations, we determine the best combination of features that generate fluent, argumentative, and logically sound arguments.
We share developed computational models for automatically annotating text with such features, and a silver-standard annotated version of an existing hate speech dialog corpora.
- Score: 3.8979646385036175
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hateful comments are prevalent on social media platforms. Although tools for
automatically detecting, flagging, and blocking such false, offensive, and
harmful content online have lately matured, such reactive and brute force
methods alone provide short-term and superficial remedies while the
perpetrators persist. With the public availability of large language models
which can generate articulate synthetic and engaging content at scale, there
are concerns about the rapid growth of dissemination of such malicious content
on the web. There is now a need to focus on deeper, long-term solutions that
involve engaging with the human perpetrator behind the source of the content to
change their viewpoint or at least bring down the rhetoric using persuasive
means. To do that, we propose defining and experimenting with controllable
strategies for generating counter-arguments to hateful comments in online
conversations. We experiment with controlling response generation using
features based on (i) argument structure and reasoning-based Walton argument
schemes, (ii) counter-argument speech acts, and (iii) human
characteristics-based qualities such as Big-5 personality traits and human
values. Using automatic and human evaluations, we determine the best
combination of features that generate fluent, argumentative, and logically
sound arguments for countering hate. We further share the developed
computational models for automatically annotating text with such features, and
a silver-standard annotated version of an existing hate speech dialog corpora.
Related papers
- Assessing the Human Likeness of AI-Generated Counterspeech [10.434435022492723]
Counterspeech is a targeted response to counteract and challenge abusive or hateful content.
Previous studies have proposed different strategies for automatically generated counterspeech.
We investigate the human likeness of AI-generated counterspeech, a critical factor influencing effectiveness.
arXiv Detail & Related papers (2024-10-14T18:48:47Z) - LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback [16.57980268646285]
This paper studies how inappropriate language in arguments can be computationally mitigated.
We propose a reinforcement learning-based rewriting approach that balances content preservation and appropriateness.
We evaluate different weighting schemes for the reward function in both absolute and relative human assessment studies.
arXiv Detail & Related papers (2024-06-05T15:18:08Z) - Understanding Counterspeech for Online Harm Mitigation [12.104301755723542]
Counterspeech offers direct rebuttals to hateful speech by challenging perpetrators of hate and showing support to targets of abuse.
It provides a promising alternative to more contentious measures, such as content moderation and deplatforming.
This paper systematically reviews counterspeech research in the social sciences and compares methodologies and findings with computer science efforts in automatic counterspeech generation.
arXiv Detail & Related papers (2023-07-01T20:54:01Z) - Which Argumentative Aspects of Hate Speech in Social Media can be
reliably identified? [2.7647400328727256]
It is unclear which aspects of argumentation can be reliably identified and integrated in language models.
We show that some components can be identified with reasonable reliability.
We propose adaptations of those categories that can be more reliably reproduced.
arXiv Detail & Related papers (2023-06-05T15:50:57Z) - Controllable Mixed-Initiative Dialogue Generation through Prompting [50.03458333265885]
Mixed-initiative dialogue tasks involve repeated exchanges of information and conversational control.
Agents gain control by generating responses that follow particular dialogue intents or strategies, prescribed by a policy planner.
Standard approach has been fine-tuning pre-trained language models to perform generation conditioned on these intents.
We instead prompt large language models as a drop-in replacement to fine-tuning on conditional generation.
arXiv Detail & Related papers (2023-05-06T23:11:25Z) - CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a
Context Synergized Hyperbolic Network [52.85130555886915]
CoSyn is a context-synergized neural network that explicitly incorporates user- and conversational context for detecting implicit hate speech in online conversations.
We show that CoSyn outperforms all our baselines in detecting implicit hate speech with absolute improvements in the range of 1.24% - 57.8%.
arXiv Detail & Related papers (2023-03-02T17:30:43Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - Parsimonious Argument Annotations for Hate Speech Counter-narratives [4.825848785596437]
We present an enrichment of the Hateval corpus of hate speech tweets (Basile et. al.) aimed to facilitate automated counter-narrative generation.
We have also annotated tweets with argumentative information based on Wagemanns, that we believe can help in building convincing and effective counter-narratives for hate speech against particular groups.
Preliminary results show that automatic annotators perform close to human annotators to detect some aspects of argumentation, while others only reach low or moderate level of inter-annotator agreement.
arXiv Detail & Related papers (2022-08-01T18:58:32Z) - Persua: A Visual Interactive System to Enhance the Persuasiveness of
Arguments in Online Discussion [52.49981085431061]
Enhancing people's ability to write persuasive arguments could contribute to the effectiveness and civility in online communication.
We derived four design goals for a tool that helps users improve the persuasiveness of arguments in online discussions.
Persua is an interactive visual system that provides example-based guidance on persuasive strategies to enhance the persuasiveness of arguments.
arXiv Detail & Related papers (2022-04-16T08:07:53Z) - Aspect-Controlled Neural Argument Generation [65.91772010586605]
We train a language model for argument generation that can be controlled on a fine-grained level to generate sentence-level arguments for a given topic, stance, and aspect.
Our evaluation shows that our generation model is able to generate high-quality, aspect-specific arguments.
These arguments can be used to improve the performance of stance detection models via data augmentation and to generate counter-arguments.
arXiv Detail & Related papers (2020-04-30T20:17:22Z) - You Impress Me: Dialogue Generation via Mutual Persona Perception [62.89449096369027]
The research in cognitive science suggests that understanding is an essential signal for a high-quality chit-chat conversation.
Motivated by this, we propose P2 Bot, a transmitter-receiver based framework with the aim of explicitly modeling understanding.
arXiv Detail & Related papers (2020-04-11T12:51:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.