Policy-as-Prompt: Rethinking Content Moderation in the Age of Large Language Models
- URL: http://arxiv.org/abs/2502.18695v1
- Date: Tue, 25 Feb 2025 23:15:16 GMT
- Title: Policy-as-Prompt: Rethinking Content Moderation in the Age of Large Language Models
- Authors: Konstantina Palla, José Luis Redondo García, Claudia Hauff, Francesco Fabbri, Henrik Lindström, Daniel R. Taber, Andreas Damianou, Mounia Lalmas,
- Abstract summary: This paper formalises the emerging policy-as-prompt framework and identifies five key challenges across four domains.<n>It lays the groundwork for future exploration of scalable and adaptive content moderation systems in digital ecosystems.
- Score: 10.549072684871478
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Content moderation plays a critical role in shaping safe and inclusive online environments, balancing platform standards, user expectations, and regulatory frameworks. Traditionally, this process involves operationalising policies into guidelines, which are then used by downstream human moderators for enforcement, or to further annotate datasets for training machine learning moderation models. However, recent advancements in large language models (LLMs) are transforming this landscape. These models can now interpret policies directly as textual inputs, eliminating the need for extensive data curation. This approach offers unprecedented flexibility, as moderation can be dynamically adjusted through natural language interactions. This paradigm shift raises important questions about how policies are operationalised and the implications for content moderation practices. In this paper, we formalise the emerging policy-as-prompt framework and identify five key challenges across four domains: Technical Implementation (1. translating policy to prompts, 2. sensitivity to prompt structure and formatting), Sociotechnical (3. the risk of technological determinism in policy formation), Organisational (4. evolving roles between policy and machine learning teams), and Governance (5. model governance and accountability). Through analysing these challenges across technical, sociotechnical, organisational, and governance dimensions, we discuss potential mitigation approaches. This research provides actionable insights for practitioners and lays the groundwork for future exploration of scalable and adaptive content moderation systems in digital ecosystems.
Related papers
- Neural DNF-MT: A Neuro-symbolic Approach for Learning Interpretable and Editable Policies [51.03989561425833]
We propose a neuro-symbolic approach called neural DNF-MT for end-to-end policy learning.
The differentiable nature of the neural DNF-MT model enables the use of deep actor-critic algorithms for training.
We show how the bivalent representations of deterministic policies can be edited and incorporated back into a neural model.
arXiv Detail & Related papers (2025-01-07T15:51:49Z) - Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice [186.055899073629]
Unlearning is often invoked as a solution for removing the effects of targeted information from a generative-AI model.<n>Unlearning is also proposed as a way to prevent a model from generating targeted types of information in its outputs.<n>Both of these goals--the targeted removal of information from a model and the targeted suppression of information from a model's outputs--present various technical and substantive challenges.
arXiv Detail & Related papers (2024-12-09T20:18:43Z) - Political-LLM: Large Language Models in Political Science [159.95299889946637]
Large language models (LLMs) have been widely adopted in political science tasks.<n>Political-LLM aims to advance the comprehensive understanding of integrating LLMs into computational political science.
arXiv Detail & Related papers (2024-12-09T08:47:50Z) - The Unappreciated Role of Intent in Algorithmic Moderation of Social Media Content [2.2618341648062477]
This paper examines the role of intent in content moderation systems.
We review state of the art detection models and benchmark training datasets for online abuse to assess their awareness and ability to capture intent.
arXiv Detail & Related papers (2024-05-17T18:05:13Z) - Fortifying Ethical Boundaries in AI: Advanced Strategies for Enhancing
Security in Large Language Models [3.9490749767170636]
Large language models (LLMs) have revolutionized text generation, translation, and question-answering tasks.
Despite their widespread use, LLMs present challenges such as ethical dilemmas when models are compelled to respond inappropriately.
This paper addresses these challenges by introducing a multi-pronged approach that includes: 1) filtering sensitive vocabulary from user input to prevent unethical responses; 2) detecting role-playing to halt interactions that could lead to 'prison break' scenarios; and 4) extending these methodologies to various LLM derivatives like Multi-Model Large Language Models (MLLMs)
arXiv Detail & Related papers (2024-01-27T08:09:33Z) - Combatting Human Trafficking in the Cyberspace: A Natural Language
Processing-Based Methodology to Analyze the Language in Online Advertisements [55.2480439325792]
This project tackles the pressing issue of human trafficking in online C2C marketplaces through advanced Natural Language Processing (NLP) techniques.
We introduce a novel methodology for generating pseudo-labeled datasets with minimal supervision, serving as a rich resource for training state-of-the-art NLP models.
A key contribution is the implementation of an interpretability framework using Integrated Gradients, providing explainable insights crucial for law enforcement.
arXiv Detail & Related papers (2023-11-22T02:45:01Z) - Context-Aware Composition of Agent Policies by Markov Decision Process
Entity Embeddings and Agent Ensembles [1.124711723767572]
Computational agents support humans in many areas of life and are therefore found in heterogeneous contexts.
In order to perform services and carry out activities in a goal-oriented manner, agents require prior knowledge.
We propose a novel simulation-based approach that enables the representation of heterogeneous contexts.
arXiv Detail & Related papers (2023-08-28T12:13:36Z) - Residual Q-Learning: Offline and Online Policy Customization without
Value [53.47311900133564]
Imitation Learning (IL) is a widely used framework for learning imitative behavior from demonstrations.
We formulate a new problem setting called policy customization.
We propose a novel framework, Residual Q-learning, which can solve the formulated MDP by leveraging the prior policy.
arXiv Detail & Related papers (2023-06-15T22:01:19Z) - Interpretable Reinforcement Learning via Neural Additive Models for
Inventory Management [3.714118205123092]
We focus on developing dynamic inventory ordering policies for a multi-echelon, i.e. multi-stage, supply chain.
Traditional inventory optimization methods aim to determine a static reordering policy.
We propose an interpretable reinforcement learning approach that aims to be as interpretable as the traditional static policies.
arXiv Detail & Related papers (2023-03-18T10:13:32Z) - Five policy uses of algorithmic transparency and explainability [0.0]
We provide case studies illustrating five ways in which algorithmic transparency and explainability have been used in policy settings.
Specific requirements for explanations; in nonbinding guidelines for internal governance of algorithms; in regulations applicable to highly regulated settings.
Case studies span a spectrum from precise requirements for specific types of explanations to nonspecific requirements focused on broader notions of transparency.
arXiv Detail & Related papers (2023-02-06T19:34:14Z) - Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients [54.98496284653234]
We consider the task of training a policy that maximizes reward while minimizing disclosure of certain sensitive state variables through the actions.
We solve this problem by introducing a regularizer based on the mutual information between the sensitive state and the actions.
We develop a model-based estimator for optimization of privacy-constrained policies.
arXiv Detail & Related papers (2020-12-30T03:22:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.