Related papers: Toxicity Detection is NOT all you Need: Measuring the Gaps to Supporting Volunteer Content Moderators

Toxicity Detection is NOT all you Need: Measuring the Gaps to Supporting Volunteer Content Moderators

URL: http://arxiv.org/abs/2311.07879v3
Date: Mon, 21 Oct 2024 16:48:18 GMT
Title: Toxicity Detection is NOT all you Need: Measuring the Gaps to Supporting Volunteer Content Moderators
Authors: Yang Trista Cao, Lovely-Frances Domingo, Sarah Ann Gilbert, Michelle Mazurek, Katie Shilton, Hal Daumé III,
Abstract summary: We conduct a model review on Hugging Face to reveal the availability of models to cover various moderation rules and guidelines. We put state-of-the-art LLMs to the test, evaluating how well these models perform in flagging violations of platform rules from one particular forum. Overall, we observe a non-trivial gap, as missing developed models and LLMs exhibit moderate to low performance on a significant portion of the rules.
Score: 19.401873797111662
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Extensive efforts in automated approaches for content moderation have been focused on developing models to identify toxic, offensive, and hateful content with the aim of lightening the load for moderators. Yet, it remains uncertain whether improvements on those tasks have truly addressed moderators' needs in accomplishing their work. In this paper, we surface gaps between past research efforts that have aimed to provide automation for aspects of content moderation and the needs of volunteer content moderators, regarding identifying violations of various moderation rules. To do so, we conduct a model review on Hugging Face to reveal the availability of models to cover various moderation rules and guidelines from three exemplar forums. We further put state-of-the-art LLMs to the test, evaluating how well these models perform in flagging violations of platform rules from one particular forum. Finally, we conduct a user survey study with volunteer moderators to gain insight into their perspectives on useful moderation models. Overall, we observe a non-trivial gap, as missing developed models and LLMs exhibit moderate to low performance on a significant portion of the rules. Moderators' reports provide guides for future work on developing moderation assistant models.

Related papers

Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric [99.56567010306807]
Large Language Models (LLMs) have become indispensable across academia, industry, and daily applications.<n>One core challenge of evaluation in the large language model (LLM) era is the generalization issue.<n>We propose Model Utilization Index (MUI), a mechanism interpretability enhanced metric that complements traditional performance scores.
arXiv Detail & Related papers (2025-04-10T04:09:47Z)
Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance. Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z)
Moderator: Moderating Text-to-Image Diffusion Models through Fine-grained Context-based Policies [11.085388940369851]
We present Moderator, a policy-based model management system that allows administrators to specify fine-grained content moderation policies. We show that Moderator can prevent 65% of users from generating moderated content under 15 attempts and require the remaining users an average of 8.3 times more attempts to generate undesired content.
arXiv Detail & Related papers (2024-08-14T16:44:46Z)
Can Language Model Moderators Improve the Health of Online Discourse? [26.191337231826246]
We establish a systematic definition of conversational moderation effectiveness grounded on moderation literature. We propose a comprehensive evaluation framework to assess models' moderation capabilities independently of human intervention.
arXiv Detail & Related papers (2023-11-16T11:14:22Z)
QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement. QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights. We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z)
Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-tuning [79.53130089003986]
Large Language Models (LLMs) have become a feasible solution for handling tasks in various domains. In this paper, we introduce how to fine-tune a LLM model that can be privately deployed for content moderation.
arXiv Detail & Related papers (2023-10-05T09:09:44Z)
Towards Intersectional Moderation: An Alternative Model of Moderation Built on Care and Power [0.4351216340655199]
I perform a collaborative ethnography with moderators of r/AskHistorians, a community that uses an alternative moderation model. I focus on three emblematic controversies of r/AskHistorians' alternative model of moderation. I argue that designers should support decision-making processes and policy makers should account for the impact of sociotechnical systems.
arXiv Detail & Related papers (2023-05-18T18:27:52Z)
Multilingual Content Moderation: A Case Study on Reddit [23.949429463013796]
We propose to study the challenges of content moderation by introducing a multilingual dataset of 1.8 million Reddit comments. We perform extensive experimental analysis to highlight the underlying challenges and suggest related research problems. Our dataset and analysis can help better prepare for the challenges and opportunities of auto moderation.
arXiv Detail & Related papers (2023-02-19T16:36:33Z)
Explainable Abuse Detection as Intent Classification and Slot Filling [66.80201541759409]
We introduce the concept of policy-aware abuse detection, abandoning the unrealistic expectation that systems can reliably learn which phenomena constitute abuse from inspecting the data alone. We show how architectures for intent classification and slot filling can be used for abuse detection, while providing a rationale for model decisions.
arXiv Detail & Related papers (2022-10-06T03:33:30Z)
Soft Expert Reward Learning for Vision-and-Language Navigation [94.86954695912125]
Vision-and-Language Navigation (VLN) requires an agent to find a specified spot in an unseen environment by following natural language instructions. We introduce a Soft Expert Reward Learning (SERL) model to overcome the reward engineering designing and generalisation problems of the VLN task.
arXiv Detail & Related papers (2020-07-21T14:17:36Z)
Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy. We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space. We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.