Toxicity Detection is NOT all you Need: Measuring the Gaps to Supporting
Volunteer Content Moderators
- URL: http://arxiv.org/abs/2311.07879v2
- Date: Sat, 17 Feb 2024 04:30:54 GMT
- Title: Toxicity Detection is NOT all you Need: Measuring the Gaps to Supporting
Volunteer Content Moderators
- Authors: Yang Trista Cao, Lovely-Frances Domingo, Sarah Ann Gilbert, Michelle
Mazurek, Katie Shilton, Hal Daum\'e III
- Abstract summary: We conduct a model review on Hugging Face to reveal the availability of models to cover various moderation rules and guidelines.
We put state-of-the-art LLMs to the test, evaluating how well these models perform in flagging violations of platform rules from one particular forum.
Overall, we observe a non-trivial gap, as missing developed models and LLMs exhibit moderate to low performance on a significant portion of the rules.
- Score: 4.347723584293261
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Extensive efforts in automated approaches for content moderation have been
focused on developing models to identify toxic, offensive, and hateful content
with the aim of lightening the load for moderators. Yet, it remains uncertain
whether improvements on those tasks have truly addressed moderators' needs in
accomplishing their work. In this paper, we surface gaps between past research
efforts that have aimed to provide automation for aspects of content moderation
and the needs of volunteer content moderators, regarding identifying violations
of various moderation rules. To do so, we conduct a model review on Hugging
Face to reveal the availability of models to cover various moderation rules and
guidelines from three exemplar forums. We further put state-of-the-art LLMs to
the test, evaluating how well these models perform in flagging violations of
platform rules from one particular forum. Finally, we conduct a user survey
study with volunteer moderators to gain insight into their perspectives on
useful moderation models. Overall, we observe a non-trivial gap, as missing
developed models and LLMs exhibit moderate to low performance on a significant
portion of the rules. Moderators' reports provide guides for future work on
developing moderation assistant models.
Related papers
- LiFT: Unsupervised Reinforcement Learning with Foundation Models as
Teachers [59.69716962256727]
We propose a framework that guides a reinforcement learning agent to acquire semantically meaningful behavior without human feedback.
In our framework, the agent receives task instructions grounded in a training environment from large language models.
We demonstrate that our method can learn semantically meaningful skills in a challenging open-ended MineDojo environment.
arXiv Detail & Related papers (2023-12-14T14:07:41Z) - Can Language Model Moderators Improve the Health of Online Discourse? [26.191337231826246]
We establish a systematic definition of conversational moderation effectiveness grounded on moderation literature.
We propose a comprehensive evaluation framework to assess models' moderation capabilities independently of human intervention.
arXiv Detail & Related papers (2023-11-16T11:14:22Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Adapting Large Language Models for Content Moderation: Pitfalls in Data
Engineering and Supervised Fine-tuning [79.53130089003986]
Large Language Models (LLMs) have become a feasible solution for handling tasks in various domains.
In this paper, we introduce how to fine-tune a LLM model that can be privately deployed for content moderation.
arXiv Detail & Related papers (2023-10-05T09:09:44Z) - Towards Intersectional Moderation: An Alternative Model of Moderation
Built on Care and Power [0.4351216340655199]
I perform a collaborative ethnography with moderators of r/AskHistorians, a community that uses an alternative moderation model.
I focus on three emblematic controversies of r/AskHistorians' alternative model of moderation.
I argue that designers should support decision-making processes and policy makers should account for the impact of sociotechnical systems.
arXiv Detail & Related papers (2023-05-18T18:27:52Z) - Multilingual Content Moderation: A Case Study on Reddit [23.949429463013796]
We propose to study the challenges of content moderation by introducing a multilingual dataset of 1.8 million Reddit comments.
We perform extensive experimental analysis to highlight the underlying challenges and suggest related research problems.
Our dataset and analysis can help better prepare for the challenges and opportunities of auto moderation.
arXiv Detail & Related papers (2023-02-19T16:36:33Z) - Proactive Moderation of Online Discussions: Existing Practices and the
Potential for Algorithmic Support [12.515485963557426]
reactive paradigm of taking action against already-posted antisocial content is currently the most common form of moderation.
We explore how automation could assist with this existing proactive moderation workflow by building a prototype tool.
arXiv Detail & Related papers (2022-11-29T19:00:02Z) - Explainable Abuse Detection as Intent Classification and Slot Filling [66.80201541759409]
We introduce the concept of policy-aware abuse detection, abandoning the unrealistic expectation that systems can reliably learn which phenomena constitute abuse from inspecting the data alone.
We show how architectures for intent classification and slot filling can be used for abuse detection, while providing a rationale for model decisions.
arXiv Detail & Related papers (2022-10-06T03:33:30Z) - Soft Expert Reward Learning for Vision-and-Language Navigation [94.86954695912125]
Vision-and-Language Navigation (VLN) requires an agent to find a specified spot in an unseen environment by following natural language instructions.
We introduce a Soft Expert Reward Learning (SERL) model to overcome the reward engineering designing and generalisation problems of the VLN task.
arXiv Detail & Related papers (2020-07-21T14:17:36Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.