Towards Effective Counter-Responses: Aligning Human Preferences with Strategies to Combat Online Trolling
- URL: http://arxiv.org/abs/2410.04164v1
- Date: Sat, 5 Oct 2024 14:01:52 GMT
- Title: Towards Effective Counter-Responses: Aligning Human Preferences with Strategies to Combat Online Trolling
- Authors: Huije Lee, Hoyun Song, Jisu Shin, Sukmin Cho, SeungYoon Han, Jong C. Park,
- Abstract summary: This paper investigates whether humans have preferred strategies tailored to different types of trolling behaviors.
We introduce a methodology for generating counter-responses to trolls by recommending appropriate RSs.
The experimental results demonstrate that our proposed approach guides constructive discussion and reduces the negative effects of trolls.
- Score: 9.598920004159696
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Trolling in online communities typically involves disruptive behaviors such as provoking anger and manipulating discussions, leading to a polarized atmosphere and emotional distress. Robust moderation is essential for mitigating these negative impacts and maintaining a healthy and constructive community atmosphere. However, effectively addressing trolls is difficult because their behaviors vary widely and require different response strategies (RSs) to counter them. This diversity makes it challenging to choose an appropriate RS for each specific situation. To address this challenge, our research investigates whether humans have preferred strategies tailored to different types of trolling behaviors. Our findings reveal a correlation between the types of trolling encountered and the preferred RS. In this paper, we introduce a methodology for generating counter-responses to trolls by recommending appropriate RSs, supported by a dataset aligning these strategies with human preferences across various troll contexts. The experimental results demonstrate that our proposed approach guides constructive discussion and reduces the negative effects of trolls, thereby enhancing the online community environment.
Related papers
- Diverging Preferences: When do Annotators Disagree and do Models Know? [92.24651142187989]
We develop a taxonomy of disagreement sources spanning 10 categories across four high-level classes.
We find that the majority of disagreements are in opposition with standard reward modeling approaches.
We develop methods for identifying diverging preferences to mitigate their influence on evaluation and training.
arXiv Detail & Related papers (2024-10-18T17:32:22Z) - SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable
Responses Created Through Human-Machine Collaboration [75.62448812759968]
This dataset is a large-scale Korean dataset of 49k sensitive questions with 42k acceptable and 46k non-acceptable responses.
The dataset was constructed leveraging HyperCLOVA in a human-in-the-loop manner based on real news headlines.
arXiv Detail & Related papers (2023-05-28T11:51:20Z) - Learning from data in the mixed adversarial non-adversarial case:
Finding the helpers and ignoring the trolls [28.903534969338015]
We study how to perform robust learning in such an environment.
We introduce a benchmark evaluation, SafetyMix, which can evaluate methods that learn safe vs. toxic language.
We propose and analyze several mitigating learning algorithms that identify trolls either at the example or at the user level.
arXiv Detail & Related papers (2022-08-05T17:33:33Z) - ELF22: A Context-based Counter Trolling Dataset to Combat Internet
Trolls [0.23624125155742054]
We propose a novel dataset for automatic counter response generation.
In particular, we constructed a pair-wise dataset that includes troll comments and counter responses with labeled response strategies.
We demonstrate that the model fine-tuned on our dataset shows a significantly improved performance in strategy-controlled sentence generation.
arXiv Detail & Related papers (2022-07-30T10:14:41Z) - PCL: Peer-Contrastive Learning with Diverse Augmentations for
Unsupervised Sentence Embeddings [69.87899694963251]
We propose a novel Peer-Contrastive Learning (PCL) with diverse augmentations.
PCL constructs diverse contrastive positives and negatives at the group level for unsupervised sentence embeddings.
PCL can perform peer-positive contrast as well as peer-network cooperation, which offers an inherent anti-bias ability.
arXiv Detail & Related papers (2022-01-28T13:02:41Z) - Exposing Paid Opinion Manipulation Trolls [19.834000431578737]
We show how to find paid trolls on the Web using machine learning.
In this paper, we assume that a user who is called a troll by several different people is likely to be such.
We compare the profiles of paid trolls vs. (ii)"mentioned" trolls vs. (iii) non-trolls, and we further show that a classifier trained to distinguish (ii) from (iii) does quite well also at telling apart (i) from (iii)
arXiv Detail & Related papers (2021-09-26T11:40:14Z) - Disturbing Reinforcement Learning Agents with Corrupted Rewards [62.997667081978825]
We analyze the effects of different attack strategies based on reward perturbations on reinforcement learning algorithms.
We show that smoothly crafting adversarial rewards are able to mislead the learner, and that using low exploration probability values, the policy learned is more robust to corrupt rewards.
arXiv Detail & Related papers (2021-02-12T15:53:48Z) - Advances and Challenges in Conversational Recommender Systems: A Survey [133.93908165922804]
We provide a systematic review of the techniques used in current conversational recommender systems (CRSs)
We summarize the key challenges of developing CRSs into five directions.
These research directions involve multiple research fields like information retrieval (IR), natural language processing (NLP), and human-computer interaction (HCI)
arXiv Detail & Related papers (2021-01-23T08:53:15Z) - "Nice Try, Kiddo": Investigating Ad Hominems in Dialogue Responses [87.89632038677912]
Ad hominem attacks are those that target some feature of a person's character instead of the position the person is maintaining.
We propose categories of ad hominems, compose an annotated dataset, and build a system to analyze human and dialogue responses to English Twitter posts.
Our results indicate that 1) responses from both humans and DialoGPT contain more ad hominems for discussions around marginalized communities, 2) different quantities of ad hominems in the training data can influence the likelihood of generating ad hominems, and 3) we can constrained decoding techniques to reduce ad hominems
arXiv Detail & Related papers (2020-10-24T07:37:49Z) - Towards control of opinion diversity by introducing zealots into a
polarised social group [7.9603223299524535]
We explore a method to influence or even control the diversity of opinions within a polarised social group.
We leverage the voter model in which users hold binary opinions and repeatedly update their beliefs based on others they connect with.
We inject zealots into a polarised network in order to shift the average opinion towards any target value.
arXiv Detail & Related papers (2020-06-12T15:27:30Z) - Detecting Troll Behavior via Inverse Reinforcement Learning: A Case
Study of Russian Trolls in the 2016 US Election [8.332032237125897]
We propose an approach based on Inverse Reinforcement Learning (IRL) to capture troll behavior and identify troll accounts.
As a study case, we consider the troll accounts identified by the US Congress during the investigation of Russian meddling in the 2016 US Presidential election.
We report promising results: the IRL-based approach is able to accurately detect troll accounts.
arXiv Detail & Related papers (2020-01-28T19:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.