Towards Safer Social Media Platforms: Scalable and Performant Few-Shot Harmful Content Moderation Using Large Language Models
- URL: http://arxiv.org/abs/2501.13976v1
- Date: Thu, 23 Jan 2025 00:19:14 GMT
- Title: Towards Safer Social Media Platforms: Scalable and Performant Few-Shot Harmful Content Moderation Using Large Language Models
- Authors: Akash Bonagiri, Lucen Li, Rajvardhan Oak, Zeerak Babar, Magdalena Wojcieszak, Anshuman Chhabra,
- Abstract summary: Harmful content on social media platforms poses significant risks to users and society.
Current approaches rely on human moderators, supervised classifiers, and large volumes of training data.
We utilize Large Language Models (LLMs) to undertake few-shot dynamic content moderation via in-context learning.
- Score: 9.42299478071576
- License:
- Abstract: The prevalence of harmful content on social media platforms poses significant risks to users and society, necessitating more effective and scalable content moderation strategies. Current approaches rely on human moderators, supervised classifiers, and large volumes of training data, and often struggle with scalability, subjectivity, and the dynamic nature of harmful content (e.g., violent content, dangerous challenge trends, etc.). To bridge these gaps, we utilize Large Language Models (LLMs) to undertake few-shot dynamic content moderation via in-context learning. Through extensive experiments on multiple LLMs, we demonstrate that our few-shot approaches can outperform existing proprietary baselines (Perspective and OpenAI Moderation) as well as prior state-of-the-art few-shot learning methods, in identifying harm. We also incorporate visual information (video thumbnails) and assess if different multimodal techniques improve model performance. Our results underscore the significant benefits of employing LLM based methods for scalable and dynamic harmful content moderation online.
Related papers
- Cross-Cultural Fashion Design via Interactive Large Language Models and Diffusion Models [0.0]
Fashion content generation is an emerging area at the intersection of artificial intelligence and creative design.
Existing methods struggle with cultural bias, limited scalability, and alignment between textual prompts and generated visuals.
We propose a novel framework that integrates Large Language Models (LLMs) with Latent Diffusion Models (LDMs) to address these challenges.
arXiv Detail & Related papers (2025-01-26T15:57:16Z) - Re-ranking Using Large Language Models for Mitigating Exposure to Harmful Content on Social Media Platforms [10.421660174482314]
We propose a novel re-ranking approach using Large Language Models (LLMs) in zero-shot and few-shot settings.
Our method dynamically assesses and re-ranks content sequences, effectively mitigating harmful content exposure without requiring extensive labeled data.
arXiv Detail & Related papers (2025-01-23T00:26:32Z) - Socio-Culturally Aware Evaluation Framework for LLM-Based Content Moderation [10.724258809442958]
We propose a socio-culturally aware evaluation framework for content moderation.
We introduce a scalable method for creating diverse datasets using persona-based generation.
arXiv Detail & Related papers (2024-12-18T07:57:18Z) - A Persuasion-Based Prompt Learning Approach to Improve Smishing Detection through Data Augmentation [1.4388765025696655]
A number of challenges remain in machine learning-based smishing detection.
Given the sensitive nature of smishing-related data, there is a lack of publicly accessible data that can be used for training and evaluating ML models.
We introduce a novel data augmentation method utilizing a few-shot prompt learning approach.
arXiv Detail & Related papers (2024-10-18T04:20:02Z) - RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks.
Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs.
In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z) - Social Debiasing for Fair Multi-modal LLMs [55.8071045346024]
Multi-modal Large Language Models (MLLMs) have advanced significantly, offering powerful vision-language understanding capabilities.
However, these models often inherit severe social biases from their training datasets, leading to unfair predictions based on attributes like race and gender.
This paper addresses the issue of social biases in MLLMs by i) Introducing a comprehensive Counterfactual dataset with Multiple Social Concepts (CMSC) and ii) Proposing an Anti-Stereotype Debiasing strategy (ASD)
arXiv Detail & Related papers (2024-08-13T02:08:32Z) - RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content [62.685566387625975]
Current mitigation strategies, while effective, are not resilient under adversarial attacks.
This paper introduces Resilient Guardrails for Large Language Models (RigorLLM), a novel framework designed to efficiently moderate harmful and unsafe inputs.
arXiv Detail & Related papers (2024-03-19T07:25:02Z) - SoMeLVLM: A Large Vision Language Model for Social Media Processing [78.47310657638567]
We introduce a Large Vision Language Model for Social Media Processing (SoMeLVLM)
SoMeLVLM is a cognitive framework equipped with five key capabilities including knowledge & comprehension, application, analysis, evaluation, and creation.
Our experiments demonstrate that SoMeLVLM achieves state-of-the-art performance in multiple social media tasks.
arXiv Detail & Related papers (2024-02-20T14:02:45Z) - MinT: Boosting Generalization in Mathematical Reasoning via Multi-View
Fine-Tuning [53.90744622542961]
Reasoning in mathematical domains remains a significant challenge for small language models (LMs)
We introduce a new method that exploits existing mathematical problem datasets with diverse annotation styles.
Experimental results show that our strategy enables a LLaMA-7B model to outperform prior approaches.
arXiv Detail & Related papers (2023-07-16T05:41:53Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.