Analyzing the Influence of Language Model-Generated Responses in
Mitigating Hate Speech on Social Media Directed at Ukrainian Refugees in
Poland
- URL: http://arxiv.org/abs/2311.16905v1
- Date: Tue, 28 Nov 2023 16:08:42 GMT
- Title: Analyzing the Influence of Language Model-Generated Responses in
Mitigating Hate Speech on Social Media Directed at Ukrainian Refugees in
Poland
- Authors: Jakub Podolak, Szymon {\L}ukasik, Pawe{\l} Balawender, Jan Ossowski,
Katarzyna B\k{a}kowicz, Piotr Sankowski
- Abstract summary: This study investigates the potential of employing responses generated by Large Language Models (LLM) to counteract hate speech on social media.
The goal was to minimize the propagation of hate speech directed at Ukrainian refugees in Poland.
The results indicate that deploying LLM-generated responses as replies to harmful tweets effectively diminishes user engagement, as measured by likes/impressions.
- Score: 2.5571889630399474
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the context of escalating hate speech and polarization on social media,
this study investigates the potential of employing responses generated by Large
Language Models (LLM), complemented with pertinent verified knowledge links, to
counteract such trends. Through extensive A/B testing involving the posting of
753 automatically generated responses, the goal was to minimize the propagation
of hate speech directed at Ukrainian refugees in Poland.
The results indicate that deploying LLM-generated responses as replies to
harmful tweets effectively diminishes user engagement, as measured by
likes/impressions. When we respond to an original tweet, i.e., which is not a
reply, we reduce the engagement of users by over 20\% without increasing the
number of impressions. On the other hand, our responses increase the ratio of
the number of replies to a harmful tweet to impressions, especially if the
harmful tweet is not original. Additionally, the study examines how generated
responses influence the overall sentiment of tweets in the discussion,
revealing that our intervention does not significantly alter the mean
sentiment.
This paper suggests the implementation of an automatic moderation system to
combat hate speech on social media and provides an in-depth analysis of the A/B
experiment, covering methodology, data collection, and statistical outcomes.
Ethical considerations and challenges are also discussed, offering guidance for
the development of discourse moderation systems leveraging the capabilities of
generative AI.
Related papers
- Outcome-Constrained Large Language Models for Countering Hate Speech [10.434435022492723]
Counterspeech that challenges or responds to hate speech has been seen as an alternative to mitigate the negative impact of hate speech and foster productive online communications.
Existing research focuses on the generation of counterspeech with certain linguistic attributes, such as being polite, informative, and intent-driven.
We first explore methods that utilize large language models (LLM) to generate counterspeech constrained by potential conversation outcomes.
arXiv Detail & Related papers (2024-03-25T19:44:06Z) - Decoding the Silent Majority: Inducing Belief Augmented Social Graph
with Large Language Model for Response Forecasting [74.68371461260946]
SocialSense is a framework that induces a belief-centered graph on top of an existent social network, along with graph-based propagation to capture social dynamics.
Our method surpasses existing state-of-the-art in experimental evaluations for both zero-shot and supervised settings.
arXiv Detail & Related papers (2023-10-20T06:17:02Z) - Leveraging Implicit Feedback from Deployment Data in Dialogue [83.02878726357523]
We study improving social conversational agents by learning from natural dialogue between users and a deployed model.
We leverage signals like user response length, sentiment and reaction of the future human utterances in the collected dialogue episodes.
arXiv Detail & Related papers (2023-07-26T11:34:53Z) - Demonstrations of the Potential of AI-based Political Issue Polling [0.0]
We develop a prompt engineering methodology for eliciting human-like survey responses from ChatGPT.
We execute large scale experiments, querying for thousands of simulated responses at a cost far lower than human surveys.
We find ChatGPT is effective at anticipating both the mean level and distribution of public opinion on a variety of policy issues.
But it is less successful at anticipating demographic-level differences.
arXiv Detail & Related papers (2023-07-10T12:17:15Z) - SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable
Responses Created Through Human-Machine Collaboration [75.62448812759968]
This dataset is a large-scale Korean dataset of 49k sensitive questions with 42k acceptable and 46k non-acceptable responses.
The dataset was constructed leveraging HyperCLOVA in a human-in-the-loop manner based on real news headlines.
arXiv Detail & Related papers (2023-05-28T11:51:20Z) - Measuring the Effect of Influential Messages on Varying Personas [67.1149173905004]
We present a new task, Response Forecasting on Personas for News Media, to estimate the response a persona might have upon seeing a news message.
The proposed task not only introduces personalization in the modeling but also predicts the sentiment polarity and intensity of each response.
This enables more accurate and comprehensive inference on the mental state of the persona.
arXiv Detail & Related papers (2023-05-25T21:01:00Z) - AutoReply: Detecting Nonsense in Dialogue Introspectively with
Discriminative Replies [71.62832112141913]
We show that dialogue models can detect errors in their own messages introspectively, by calculating the likelihood of replies that are indicative of poor messages.
We first show that hand-crafted replies can be effective for the task of detecting nonsense in applications as complex as Diplomacy.
We find that AutoReply-generated replies outperform handcrafted replies and perform on par with carefully fine-tuned large supervised models.
arXiv Detail & Related papers (2022-11-22T22:31:34Z) - Assessing the impact of contextual information in hate speech detection [0.48369513656026514]
We provide a novel corpus for contextualized hate speech detection based on user responses to news posts from media outlets on Twitter.
This corpus was collected in the Rioplatense dialectal variety of Spanish and focuses on hate speech associated with the COVID-19 pandemic.
arXiv Detail & Related papers (2022-10-02T09:04:47Z) - "Stop Asian Hate!" : Refining Detection of Anti-Asian Hate Speech During
the COVID-19 Pandemic [2.5227595609842206]
COVID-19 pandemic has fueled a surge in anti-Asian xenophobia and prejudice.
We create and annotate a corpus of Twitter tweets using 2 experimental approaches to explore anti-Asian abusive and hate speech.
arXiv Detail & Related papers (2021-12-04T06:55:19Z) - News consumption and social media regulations policy [70.31753171707005]
We analyze two social media that enforced opposite moderation methods, Twitter and Gab, to assess the interplay between news consumption and content regulation.
Our results show that the presence of moderation pursued by Twitter produces a significant reduction of questionable content.
The lack of clear regulation on Gab results in the tendency of the user to engage with both types of content, showing a slight preference for the questionable ones which may account for a dissing/endorsement behavior.
arXiv Detail & Related papers (2021-06-07T19:26:32Z) - Generating Counter Narratives against Online Hate Speech: Data and
Strategies [21.098614110697184]
We present a study on how to collect responses to hate effectively.
We employ large scale unsupervised language models such as GPT-2 for the generation of silver data.
The best annotation strategies/neural architectures can be used for data filtering before expert validation/post-editing.
arXiv Detail & Related papers (2020-04-08T19:35:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.