Advancing Content Moderation: Evaluating Large Language Models for Detecting Sensitive Content Across Text, Images, and Videos
- URL: http://arxiv.org/abs/2411.17123v1
- Date: Tue, 26 Nov 2024 05:29:18 GMT
- Title: Advancing Content Moderation: Evaluating Large Language Models for Detecting Sensitive Content Across Text, Images, and Videos
- Authors: Nouar AlDahoul, Myles Joshua Toledo Tan, Harishwar Reddy Kasireddy, Yasir Zaki,
- Abstract summary: Governments, educators, and parents are often at odds with media platforms about how to regulate, control, and limit the spread of such content.
Techniques from natural language processing and computer vision have been used widely to automatically identify and filter out sensitive content.
More sophisticated algorithms for understanding the context of both text and image may open rooms for improvement in content censorship.
- Score: 0.1399948157377307
- License:
- Abstract: The widespread dissemination of hate speech, harassment, harmful and sexual content, and violence across websites and media platforms presents substantial challenges and provokes widespread concern among different sectors of society. Governments, educators, and parents are often at odds with media platforms about how to regulate, control, and limit the spread of such content. Technologies for detecting and censoring the media contents are a key solution to addressing these challenges. Techniques from natural language processing and computer vision have been used widely to automatically identify and filter out sensitive content such as offensive languages, violence, nudity, and addiction in both text, images, and videos, enabling platforms to enforce content policies at scale. However, existing methods still have limitations in achieving high detection accuracy with fewer false positives and false negatives. Therefore, more sophisticated algorithms for understanding the context of both text and image may open rooms for improvement in content censorship to build a more efficient censorship system. In this paper, we evaluate existing LLM-based content moderation solutions such as OpenAI moderation model and Llama-Guard3 and study their capabilities to detect sensitive contents. Additionally, we explore recent LLMs such as GPT, Gemini, and Llama in identifying inappropriate contents across media outlets. Various textual and visual datasets like X tweets, Amazon reviews, news articles, human photos, cartoons, sketches, and violence videos have been utilized for evaluation and comparison. The results demonstrate that LLMs outperform traditional techniques by achieving higher accuracy and lower false positive and false negative rates. This highlights the potential to integrate LLMs into websites, social media platforms, and video-sharing services for regulatory and content moderation purposes.
Related papers
- Instruction-Guided Editing Controls for Images and Multimedia: A Survey in LLM era [50.19334853510935]
Recent strides in instruction-based editing have enabled intuitive interaction with visual content, using natural language as a bridge between user intent and complex editing operations.
We aim to democratize powerful visual editing across various industries, from entertainment to education.
arXiv Detail & Related papers (2024-11-15T05:18:15Z) - MultiHateClip: A Multilingual Benchmark Dataset for Hateful Video Detection on YouTube and Bilibili [11.049937698021054]
This study presents MultiHateClip, a novel multilingual dataset created through hate lexicons and human annotation.
It aims to enhance the detection of hateful videos on platforms such as YouTube and Bilibili, including content in both English and Chinese languages.
arXiv Detail & Related papers (2024-07-28T08:19:09Z) - The Potential of Vision-Language Models for Content Moderation of
Children's Videos [1.0589208420411014]
This paper presents an in depth analysis of how context-specific language prompts affect content moderation performance.
It is important to include more context in content moderation prompts, particularly for cartoon videos.
arXiv Detail & Related papers (2023-12-06T22:29:16Z) - A Survey on Detection of LLMs-Generated Content [97.87912800179531]
The ability to detect LLMs-generated content has become of paramount importance.
We aim to provide a detailed overview of existing detection strategies and benchmarks.
We also posit the necessity for a multi-faceted approach to defend against various attacks.
arXiv Detail & Related papers (2023-10-24T09:10:26Z) - An Image is Worth a Thousand Toxic Words: A Metamorphic Testing
Framework for Content Moderation Software [64.367830425115]
Social media platforms are being increasingly misused to spread toxic content, including hate speech, malicious advertising, and pornography.
Despite tremendous efforts in developing and deploying content moderation methods, malicious users can evade moderation by embedding texts into images.
We propose a metamorphic testing framework for content moderation software.
arXiv Detail & Related papers (2023-08-18T20:33:06Z) - Validating Multimedia Content Moderation Software via Semantic Fusion [16.322773343799575]
We introduce Semantic Fusion, a general, effective methodology for validating multimedia content moderation software.
We employ DUO to test five commercial content moderation software and two state-of-the-art models against three kinds of toxic content.
The results show that DUO achieves up to 100% error finding rate (EFR) when testing moderation software.
arXiv Detail & Related papers (2023-05-23T02:44:15Z) - A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In
Zero Shot [67.00455874279383]
We propose verbalizing long videos to generate descriptions in natural language, then performing video-understanding tasks on the generated story as opposed to the original video.
Our method, despite being zero-shot, achieves significantly better results than supervised baselines for video understanding.
To alleviate a lack of story understanding benchmarks, we publicly release the first dataset on a crucial task in computational social science on persuasion strategy identification.
arXiv Detail & Related papers (2023-05-16T19:13:11Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - Fighting Malicious Media Data: A Survey on Tampering Detection and
Deepfake Detection [115.83992775004043]
Recent advances in deep learning, particularly deep generative models, open the doors for producing perceptually convincing images and videos at a low cost.
This paper provides a comprehensive review of the current media tampering detection approaches, and discusses the challenges and trends in this field for future research.
arXiv Detail & Related papers (2022-12-12T02:54:08Z) - Emotion Based Hate Speech Detection using Multimodal Learning [0.0]
This paper proposes the first multimodal deep learning framework to combine the auditory features representing emotion and the semantic features to detect hateful content.
Our results demonstrate that incorporating emotional attributes leads to significant improvement over text-based models in detecting hateful multimedia content.
arXiv Detail & Related papers (2022-02-13T05:39:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.