Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster
- URL: http://arxiv.org/abs/2406.04106v1
- Date: Thu, 6 Jun 2024 14:23:10 GMT
- Title: Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster
- Authors: Agostina Calabrese, Leonardo Neves, Neil Shah, Maarten W. Bos, Björn Ross, Mirella Lapata, Francesco Barbieri,
- Abstract summary: We investigate the effect of explanations on the speed of real-world moderators.
Our experiments show that while generic explanations do not affect their speed and are often ignored, structured explanations lower moderators' decision making time by 7.4%.
- Score: 72.84926097773578
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Content moderators play a key role in keeping the conversation on social media healthy. While the high volume of content they need to judge represents a bottleneck to the moderation pipeline, no studies have explored how models could support them to make faster decisions. There is, by now, a vast body of research into detecting hate speech, sometimes explicitly motivated by a desire to help improve content moderation, but published research using real content moderators is scarce. In this work we investigate the effect of explanations on the speed of real-world moderators. Our experiments show that while generic explanations do not affect their speed and are often ignored, structured explanations lower moderators' decision making time by 7.4%.
Related papers
- Venire: A Machine Learning-Guided Panel Review System for Community Content Moderation [17.673993032146527]
We develop Venire, an ML-backed system for panel review on Reddit.
Venire uses a machine learning model trained on log data to identify the cases where moderators are most likely to disagree.
We show that Venire is able to improve decision consistency and surface latent disagreements.
arXiv Detail & Related papers (2024-10-30T20:39:34Z) - Why Should This Article Be Deleted? Transparent Stance Detection in
Multilingual Wikipedia Editor Discussions [47.944081120226905]
We construct a novel dataset of Wikipedia editor discussions along with their reasoning in three languages.
The dataset contains the stances of the editors (keep, delete, merge, comment), along with the stated reason, and a content moderation policy, for each edit decision.
We demonstrate that stance and corresponding reason (policy) can be predicted jointly with a high degree of accuracy, adding transparency to the decision-making process.
arXiv Detail & Related papers (2023-10-09T15:11:02Z) - User Attitudes to Content Moderation in Web Search [49.1574468325115]
We examine the levels of support for different moderation practices applied to potentially misleading and/or potentially offensive content in web search.
We find that the most supported practice is informing users about potentially misleading or offensive content, and the least supported one is the complete removal of search results.
More conservative users and users with lower levels of trust in web search results are more likely to be against content moderation in web search.
arXiv Detail & Related papers (2023-10-05T10:57:15Z) - BiasX: "Thinking Slow" in Toxic Content Moderation with Explanations of
Implied Social Biases [28.519851740902258]
BiasX is a framework that enhances content moderation setups with free-text explanations of statements' implied social biases.
We show that participants substantially benefit from explanations for correctly identifying subtly (non-)toxic content.
Our results showcase the promise of using free-text explanations to encourage more thoughtful toxicity moderation.
arXiv Detail & Related papers (2023-05-23T01:45:18Z) - Analyzing Norm Violations in Live-Stream Chat [49.120561596550395]
We study the first NLP study dedicated to detecting norm violations in conversations on live-streaming platforms.
We define norm violation categories in live-stream chats and annotate 4,583 moderated comments from Twitch.
Our results show that appropriate contextual information can boost moderation performance by 35%.
arXiv Detail & Related papers (2023-05-18T05:58:27Z) - AppealMod: Inducing Friction to Reduce Moderator Workload of Handling
User Appeals [7.898353262890439]
We designed and built AppealMod, a system that induces friction in the appeals process by asking users to provide additional information before their appeals are reviewed by human moderators.
We conducted a randomized field experiment in a Reddit community of over 29 million users that lasted for four months.
Our system is effective at reducing moderator workload and minimizing their exposure to toxic content while honoring their preference for direct engagement and agency in appeals.
arXiv Detail & Related papers (2023-01-17T20:15:20Z) - News consumption and social media regulations policy [70.31753171707005]
We analyze two social media that enforced opposite moderation methods, Twitter and Gab, to assess the interplay between news consumption and content regulation.
Our results show that the presence of moderation pursued by Twitter produces a significant reduction of questionable content.
The lack of clear regulation on Gab results in the tendency of the user to engage with both types of content, showing a slight preference for the questionable ones which may account for a dissing/endorsement behavior.
arXiv Detail & Related papers (2021-06-07T19:26:32Z) - Moderation Challenges in Voice-based Online Communities on Discord [24.417653462255448]
Findings suggest that the affordances of voice-based online communities change what it means to moderate content and interactions.
New ways to break rules that moderators of text-based communities find unfamiliar, such as disruptive noise and voice raiding.
New moderation strategies are limited and often based on hearsay and first impressions, resulting in problems ranging from unsuccessful moderation to false accusations.
arXiv Detail & Related papers (2021-01-13T18:43:22Z) - Information Consumption and Social Response in a Segregated Environment:
the Case of Gab [74.5095691235917]
This work provides a characterization of the interaction patterns within Gab around the COVID-19 topic.
We find that there are no strong statistical differences in the social response to questionable and reliable content.
Our results provide insights toward the understanding of coordinated inauthentic behavior and on the early-warning of information operation.
arXiv Detail & Related papers (2020-06-03T11:34:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.