Related papers: VLM as Policy: Common-Law Content Moderation Framework for Short Video Platform

VLM as Policy: Common-Law Content Moderation Framework for Short Video Platform

URL: http://arxiv.org/abs/2504.14904v1
Date: Mon, 21 Apr 2025 07:20:19 GMT
Title: VLM as Policy: Common-Law Content Moderation Framework for Short Video Platform
Authors: Xingyu Lu, Tianke Zhang, Chang Meng, Xiaobei Wang, Jinpeng Wang, YiFan Zhang, Shisong Tang, Changyi Liu, Haojie Ding, Kaiyu Jiang, Kaiyu Tang, Bin Wen, Hai-Tao Zheng, Fan Yang, Tingting Gao, Di Zhang, Kun Gai,
Abstract summary: Short video platforms (SVPs) face significant challenges in moderating detrimental content to users' mental health.<n>Existing methods suffer from critical limitations: Manual review is prone to human bias and incurs high operational costs.<n>We propose our common-law content moderation framework named KuaiMod to address these challenges.
Score: 28.523936398292683
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Exponentially growing short video platforms (SVPs) face significant challenges in moderating content detrimental to users' mental health, particularly for minors. The dissemination of such content on SVPs can lead to catastrophic societal consequences. Although substantial efforts have been dedicated to moderating such content, existing methods suffer from critical limitations: (1) Manual review is prone to human bias and incurs high operational costs. (2) Automated methods, though efficient, lack nuanced content understanding, resulting in lower accuracy. (3) Industrial moderation regulations struggle to adapt to rapidly evolving trends due to long update cycles. In this paper, we annotate the first SVP content moderation benchmark with authentic user/reviewer feedback to fill the absence of benchmark in this field. Then we evaluate various methods on the benchmark to verify the existence of the aforementioned limitations. We further propose our common-law content moderation framework named KuaiMod to address these challenges. KuaiMod consists of three components: training data construction, offline adaptation, and online deployment & refinement. Leveraging large vision language model (VLM) and Chain-of-Thought (CoT) reasoning, KuaiMod adequately models video toxicity based on sparse user feedback and fosters dynamic moderation policy with rapid update speed and high accuracy. Offline experiments and large-scale online A/B test demonstrates the superiority of KuaiMod: KuaiMod achieves the best moderation performance on our benchmark. The deployment of KuaiMod reduces the user reporting rate by 20% and its application in video recommendation increases both Daily Active User (DAU) and APP Usage Time (AUT) on several Kuaishou scenarios. We have open-sourced our benchmark at https://kuaimod.github.io.

Related papers

FLAME: Flexible LLM-Assisted Moderation Engine [2.966082563853265]
We introduce Flexible LLM-Assisted Moderation Engine (FLAME)<n>Unlike traditional circuit-breaking methods that analyze user queries, FLAME evaluates model responses.<n>Our experiments demonstrate that FLAME significantly outperforms current moderation systems.
arXiv Detail & Related papers (2025-02-13T11:05:55Z)
Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model [62.38322742493649]
We build a video VQA benchmark covering editing categories, i.e., effect, funny, meme, and game. Most of the open-source video LMMs perform poorly on the benchmark, indicating a huge domain gap between edited short videos on social media and regular raw videos. To improve the generalization ability of LMMs, we collect a training set for the proposed benchmark based on both Panda-70M/WebVid raw videos and small-scale TikTok/CapCut edited videos.
arXiv Detail & Related papers (2024-06-15T03:28:52Z)
Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster [72.84926097773578]
We investigate the effect of explanations on the speed of real-world moderators. Our experiments show that while generic explanations do not affect their speed and are often ignored, structured explanations lower moderators' decision making time by 7.4%.
arXiv Detail & Related papers (2024-06-06T14:23:10Z)
LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback [16.57980268646285]
This paper studies how inappropriate language in arguments can be computationally mitigated. We propose a reinforcement learning-based rewriting approach that balances content preservation and appropriateness. We evaluate different weighting schemes for the reward function in both absolute and relative human assessment studies.
arXiv Detail & Related papers (2024-06-05T15:18:08Z)
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models [123.66104233291065]
Jailbreak attacks cause large language models (LLMs) to generate harmful, unethical, or otherwise objectionable content. evaluating these attacks presents a number of challenges, which the current collection of benchmarks and evaluation techniques do not adequately address. JailbreakBench is an open-sourced benchmark with the following components.
arXiv Detail & Related papers (2024-03-28T02:44:02Z)
Content Moderation on Social Media in the EU: Insights From the DSA Transparency Database [0.0]
Digital Services Act (DSA) requires large social media platforms in the EU to provide clear and specific information whenever they restrict access to certain content. Statements of Reasons (SoRs) are collected in the DSA Transparency Database to ensure transparency and scrutiny of content moderation decisions. We empirically analyze 156 million SoRs within an observation period of two months to provide an early look at content moderation decisions of social media platforms in the EU.
arXiv Detail & Related papers (2023-12-07T16:56:19Z)
An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation Software [64.367830425115]
Social media platforms are being increasingly misused to spread toxic content, including hate speech, malicious advertising, and pornography. Despite tremendous efforts in developing and deploying content moderation methods, malicious users can evade moderation by embedding texts into images. We propose a metamorphic testing framework for content moderation software.
arXiv Detail & Related papers (2023-08-18T20:33:06Z)
Analyzing Norm Violations in Live-Stream Chat [49.120561596550395]
We study the first NLP study dedicated to detecting norm violations in conversations on live-streaming platforms. We define norm violation categories in live-stream chats and annotate 4,583 moderated comments from Twitch. Our results show that appropriate contextual information can boost moderation performance by 35%.
arXiv Detail & Related papers (2023-05-18T05:58:27Z)
Bandits for Online Calibration: An Application to Content Moderation on Social Media Platforms [14.242221219862849]
We describe the current content moderation strategy employed by Meta to remove policy-violating content from its platforms. We use both handcrafted and learned risk models to flag potentially violating content for human review. Our approach aggregates these risk models into a single ranking score, calibrating them to prioritize more reliable risk models.
arXiv Detail & Related papers (2022-11-11T23:55:53Z)
Reliable Decision from Multiple Subtasks through Threshold Optimization: Content Moderation in the Wild [7.176020195419459]
Social media platforms struggle to protect users from harmful content through content moderation. These platforms have recently leveraged machine learning models to cope with the vast amount of user-generated content daily. Third-party content moderation services provide prediction scores of multiple subtasks, such as predicting the existence of underage personnel, rude gestures, or weapons. We introduce a simple yet effective threshold optimization method that searches the optimal thresholds of the multiple subtasks to make a reliable moderation decision in a cost-effective way.
arXiv Detail & Related papers (2022-08-16T03:51:43Z)
DDPG++: Striving for Simplicity in Continuous-control Off-Policy Reinforcement Learning [95.60782037764928]
We show that simple Deterministic Policy Gradient works remarkably well as long as the overestimation bias is controlled. Second, we pinpoint training instabilities, typical of off-policy algorithms, to the greedy policy update step. Third, we show that ideas in the propensity estimation literature can be used to importance-sample transitions from replay buffer and update policy to prevent deterioration of performance.
arXiv Detail & Related papers (2020-06-26T20:21:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.