Scaling Reinforcement Learning for Content Moderation with Large Language Models
- URL: http://arxiv.org/abs/2512.20061v1
- Date: Tue, 23 Dec 2025 05:27:16 GMT
- Title: Scaling Reinforcement Learning for Content Moderation with Large Language Models
- Authors: Hamed Firooz, Rui Liu, Yuchen Lu, Zhenyu Hou, Fangzhou Xiong, Xiaoyang Zhang, Changshu Jian, Zhicheng Zhu, Jiayuan Ma, Jacob Tao, Chaitali Gupta, Xiaochang Peng, Shike Mei, Hang Cui, Yang Qin, Shuo Tang, Jason Gaedtke, Arpit Mittal,
- Abstract summary: We present a comprehensive empirical investigation of scaling reinforcement learning for content classification.<n>We show that RL substantially improves performance on tasks requiring complex policy-grounded reasoning.
- Score: 16.516137166093696
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Content moderation at scale remains one of the most pressing challenges in today's digital ecosystem, where billions of user- and AI-generated artifacts must be continuously evaluated for policy violations. Although recent advances in large language models (LLMs) have demonstrated strong potential for policy-grounded moderation, the practical challenges of training these systems to achieve expert-level accuracy in real-world settings remain largely unexplored, particularly in regimes characterized by label sparsity, evolving policy definitions, and the need for nuanced reasoning beyond shallow pattern matching. In this work, we present a comprehensive empirical investigation of scaling reinforcement learning (RL) for content classification, systematically evaluating multiple RL training recipes and reward-shaping strategies-including verifiable rewards and LLM-as-judge frameworks-to transform general-purpose language models into specialized, policy-aligned classifiers across three real-world content moderation tasks. Our findings provide actionable insights for industrial-scale moderation systems, demonstrating that RL exhibits sigmoid-like scaling behavior in which performance improves smoothly with increased training data, rollouts, and optimization steps before gradually saturating. Moreover, we show that RL substantially improves performance on tasks requiring complex policy-grounded reasoning while achieving up to 100x higher data efficiency than supervised fine-tuning, making it particularly effective in domains where expert annotations are scarce or costly.
Related papers
- Sample-Efficient Neurosymbolic Deep Reinforcement Learning [49.60927398960061]
We propose a neuro-symbolic Deep RL approach that integrates background symbolic knowledge to improve sample efficiency.<n>Online reasoning is performed to guide the training process through two mechanisms.<n>We show improved performance over a state-of-the-art reward machine baseline.
arXiv Detail & Related papers (2026-01-06T09:28:53Z) - Demystifying Reinforcement Learning in Agentic Reasoning [90.3737088727791]
We conduct a comprehensive and systematic investigation to demystify reinforcement learning in agentic reasoning.<n>We highlight our key insights: (i) replacing stitched synthetic trajectories with real end-to-end tool-use trajectories yields a far stronger SFT.<n> Exploration-friendly techniques are crucial for agentic RL, such as clip higher, overlong reward shaping, and maintaining adequate policy entropy could improve the training efficiency.
arXiv Detail & Related papers (2025-10-13T17:57:15Z) - Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective [52.38531288378491]
reinforcement learning (RL) methods have substantially enhanced the planning capabilities of Large Language Models (LLMs)<n>In this work, we investigate RL's benefits and limitations through a tractable graph-based abstraction.<n>Our theoretical analyses reveal that supervised fine-tuning (SFT) may introduce co-occurrence-based spurious solutions, whereas RL achieves correct planning primarily through exploration.
arXiv Detail & Related papers (2025-09-26T17:39:48Z) - LLM-Driven Policy Diffusion: Enhancing Generalization in Offline Reinforcement Learning [23.628360655654507]
Reinforcement Learning (RL) is known for its strong decision-making capabilities and has been widely applied in various real-world scenarios.<n>Due to the limitations of offline data, RL agents often struggle to generalize to new tasks or environments.<n>We propose LLM-Driven Policy Diffusion (LLMDPD), a novel approach that enhances generalization in offline RL using task-specific prompts.
arXiv Detail & Related papers (2025-08-30T04:02:33Z) - Generalization vs. Memorization in Autoregressive Deep Learning: Or, Examining Temporal Decay of Gradient Coherence [0.1286280695561924]
We apply influence function formalism to characterize how autoregressive PDE surrogates assimilate and propagate information derived from diverse physical scenarios.<n>We provide actionable insights regarding the design of improved surrogates.
arXiv Detail & Related papers (2025-08-18T20:29:34Z) - Leveraging Genetic Algorithms for Efficient Demonstration Generation in Real-World Reinforcement Learning Environments [0.8602553195689513]
Reinforcement Learning (RL) has demonstrated significant potential in certain real-world industrial applications.<n>This study investigates the utilization of Genetic Algorithms (GAs) as a mechanism for improving RL performance.<n>We propose a novel approach in which GA-generated expert demonstrations are used to enhance policy learning.
arXiv Detail & Related papers (2025-07-01T14:04:17Z) - LLM Post-Training: A Deep Dive into Reasoning Large Language Models [131.10969986056]
Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications.<n>Post-training methods enable LLMs to refine their knowledge, improve reasoning, enhance factual accuracy, and align more effectively with user intents and ethical considerations.
arXiv Detail & Related papers (2025-02-28T18:59:54Z) - Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining [55.262510814326035]
Existing reweighting strategies primarily focus on group-level data importance.<n>We introduce novel algorithms for dynamic, instance-level data reweighting.<n>Our framework allows us to devise reweighting strategies deprioritizing redundant or uninformative data.
arXiv Detail & Related papers (2025-02-10T17:57:15Z) - Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization [22.67700436936984]
We introduce Direct Advantage Policy Optimization (DAPO), a novel step-level offline reinforcement learning algorithm.<n>DAPO employs a critic function to predict the reasoning accuracy at each step, thereby generating dense signals to refine the generation strategy.<n>Our results show that DAPO can effectively enhance the mathematical and code capabilities on both SFT models and RL models, demonstrating the effectiveness of DAPO.
arXiv Detail & Related papers (2024-12-24T08:39:35Z) - RaCT: Ranking-aware Chain-of-Thought Optimization for LLMs [30.216174551427443]
Large language models (LLMs) have demonstrated remarkable potential in text reranking tasks.<n> conventional supervised fine-tuning approaches for specializing LLMs in ranking tasks often lead to significant degradation of the models' general-purpose abilities.<n>This paper presents a novel methodology that strategically combines Chain-of-Thought (CoT) prompting techniques with an innovative two-stage training pipeline.
arXiv Detail & Related papers (2024-12-18T23:24:15Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.