Related papers: Asking For It: Question-Answering for Predicting Rule Infractions in Online Content Moderation

Asking For It: Question-Answering for Predicting Rule Infractions in Online Content Moderation

URL: http://arxiv.org/abs/2510.06350v1
Date: Tue, 07 Oct 2025 18:11:27 GMT
Title: Asking For It: Question-Answering for Predicting Rule Infractions in Online Content Moderation
Authors: Mattia Samory, Diana Pamfile, Andrew To, Shruti Phadke,
Abstract summary: ModQ is a novel question-answering framework for rule-sensitive content moderation.<n>We implement two model variants and train them on large-scale datasets from Reddit and Lemmy.<n>Both models outperform state-of-the-art baselines in identifying moderation-relevant rule violations.
Score: 1.803599876087764
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Online communities rely on a mix of platform policies and community-authored rules to define acceptable behavior and maintain order. However, these rules vary widely across communities, evolve over time, and are enforced inconsistently, posing challenges for transparency, governance, and automation. In this paper, we model the relationship between rules and their enforcement at scale, introducing ModQ, a novel question-answering framework for rule-sensitive content moderation. Unlike prior classification or generation-based approaches, ModQ conditions on the full set of community rules at inference time and identifies which rule best applies to a given comment. We implement two model variants - extractive and multiple-choice QA - and train them on large-scale datasets from Reddit and Lemmy, the latter of which we construct from publicly available moderation logs and rule descriptions. Both models outperform state-of-the-art baselines in identifying moderation-relevant rule violations, while remaining lightweight and interpretable. Notably, ModQ models generalize effectively to unseen communities and rules, supporting low-resource moderation settings and dynamic governance environments.

Related papers

GMP: A Benchmark for Content Moderation under Co-occurring Violations and Dynamic Rules [10.423914922203265]
Large language models (LLMs) are adept at following fixed guidelines, but their judgment capabilities degrade when policies are unstable or context-dependent.<n>This raises a critical question for evaluation: Does high performance on existing static benchmarks truly guarantee robust generalization of AI judgment to real-world scenarios involving co-occurring violations and dynamically changing rules?
arXiv Detail & Related papers (2026-03-02T10:50:11Z)
Decoding the Rule Book: Extracting Hidden Moderation Criteria from Reddit Communities [11.963784232069907]
We represent moderation criteria as score tables of lexical expressions associated with content removal.<n>Our experiments demonstrate that these extracted lexical patterns effectively replicate the performance of neural moderation models.<n>The resulting criteria matrix reveals significant variations in how seemingly shared norms are actually enforced.
arXiv Detail & Related papers (2025-09-03T01:27:26Z)
Customize Multi-modal RAI Guardrails with Precedent-based predictions [55.63757336900865]
A multi-modal guardrail must effectively filter image content based on user-defined policies.<n>Existing fine-tuning methods typically condition predictions on pre-defined policies.<n>We propose to condition model's judgment on "precedents", which are the reasoning processes of prior data points similar to the given input.
arXiv Detail & Related papers (2025-07-28T03:45:34Z)
Reddit Rules and Rulers: Quantifying the Link Between Rules and Perceptions of Governance across Thousands of Communities [13.80648276848838]
We conduct the largest-to-date analysis of rules on Reddit, collecting a set of 67,545 unique rules across 5,225 communities.<n>More than just a point-in-time study, our work measures how communities change their rules over a 5+ year period.<n>We are the first to identify the rules most strongly associated with positive community perceptions of governance.
arXiv Detail & Related papers (2025-01-24T01:26:41Z)
Symbolic Working Memory Enhances Language Models for Complex Rule Application [87.34281749422756]
Large Language Models (LLMs) have shown remarkable reasoning performance but struggle with multi-step deductive reasoning. We propose augmenting LLMs with external working memory and introduce a neurosymbolic framework for rule application. Our framework iteratively performs symbolic rule grounding and LLM-based rule implementation.
arXiv Detail & Related papers (2024-08-24T19:11:54Z)
SoFA: Shielded On-the-fly Alignment via Priority Rule Following [90.32819418613407]
This paper introduces a novel alignment paradigm, priority rule following, which defines rules as the primary control mechanism in each dialog. We present PriorityDistill, a semi-automated approach for distilling priority following signals from simulations to ensure robust rule integration and adherence.
arXiv Detail & Related papers (2024-02-27T09:52:27Z)
ChatRule: Mining Logical Rules with Large Language Models for Knowledge Graph Reasoning [107.61997887260056]
We propose a novel framework, ChatRule, unleashing the power of large language models for mining logical rules over knowledge graphs. Specifically, the framework is initiated with an LLM-based rule generator, leveraging both the semantic and structural information of KGs. To refine the generated rules, a rule ranking module estimates the rule quality by incorporating facts from existing KGs.
arXiv Detail & Related papers (2023-09-04T11:38:02Z)
Rule By Example: Harnessing Logical Rules for Explainable Hate Speech Detection [13.772240348963303]
Rule By Example (RBE) is a novel-based contrastive learning approach for learning from logical rules for the task of textual content moderation. RBE is capable of providing rule-grounded predictions, allowing for more explainable and customizable predictions compared to typical deep learning-based approaches.
arXiv Detail & Related papers (2023-07-24T16:55:37Z)
Explainable Abuse Detection as Intent Classification and Slot Filling [66.80201541759409]
We introduce the concept of policy-aware abuse detection, abandoning the unrealistic expectation that systems can reliably learn which phenomena constitute abuse from inspecting the data alone. We show how architectures for intent classification and slot filling can be used for abuse detection, while providing a rationale for model decisions.
arXiv Detail & Related papers (2022-10-06T03:33:30Z)
Rewriting a Deep Generative Model [56.91974064348137]
We introduce a new problem setting: manipulation of specific rules encoded by a deep generative model. We propose a formulation in which the desired rule is changed by manipulating a layer of a deep network as a linear associative memory. We present a user interface to enable users to interactively change the rules of a generative model to achieve desired effects.
arXiv Detail & Related papers (2020-07-30T17:58:16Z)
Building Rule Hierarchies for Efficient Logical Rule Learning from Knowledge Graphs [20.251630903853016]
We propose new methods for pruning unpromising rules using rule hierarchies. We show that the application of HPMs is effective in removing unpromising rules.
arXiv Detail & Related papers (2020-06-29T16:33:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.