Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation
- URL: http://arxiv.org/abs/2411.19832v2
- Date: Fri, 06 Dec 2024 13:41:53 GMT
- Title: Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation
- Authors: Dimosthenis Antypas, Indira Sen, Carla Perez-Almendros, Jose Camacho-Collados, Francesco Barbieri,
- Abstract summary: We put forward a unified dataset tailored for social media content moderation across six sensitive categories.
These include conflictual language, profanity, sexually explicit material, drug-related content, self-harm, and spam.
Fine-tuning large language models on this novel dataset yields significant improvements in detection performance compared to open off-the-shelf models.
- Score: 15.355814393928707
- License:
- Abstract: The detection of sensitive content in large datasets is crucial for ensuring that shared and analysed data is free from harmful material. However, current moderation tools, such as external APIs, suffer from limitations in customisation, accuracy across diverse sensitive categories, and privacy concerns. Additionally, existing datasets and open-source models focus predominantly on toxic language, leaving gaps in detecting other sensitive categories such as substance abuse or self-harm. In this paper, we put forward a unified dataset tailored for social media content moderation across six sensitive categories: conflictual language, profanity, sexually explicit material, drug-related content, self-harm, and spam. By collecting and annotating data with consistent retrieval strategies and guidelines, we address the shortcomings of previous focalised research. Our analysis demonstrates that fine-tuning large language models (LLMs) on this novel dataset yields significant improvements in detection performance compared to open off-the-shelf models such as LLaMA, and even proprietary OpenAI models, which underperform by 10-15% overall. This limitation is even more pronounced on popular moderation APIs, which cannot be easily tailored to specific sensitive content categories, among others.
Related papers
- Watching the AI Watchdogs: A Fairness and Robustness Analysis of AI Safety Moderation Classifiers [5.35599092568615]
Safety Moderation (ASM) classifiers are designed to moderate content on social media platforms.
It is crucial to ensure that these classifiers do not unfairly classify content belonging to users from minority groups.
We thus examine the fairness and robustness of four widely-used, closed-source ASM classifiers.
arXiv Detail & Related papers (2025-01-23T01:04:00Z) - The Empirical Impact of Data Sanitization on Language Models [1.1359551336076306]
This paper empirically analyzes the effects of data sanitization across several benchmark language-modeling tasks.
Our results suggest that for some tasks such as sentiment analysis or entailment, the impact of redaction is quite low, typically around 1-5%.
For tasks such as comprehension Q&A there is a big drop of >25% in performance observed in redacted queries as compared to the original.
arXiv Detail & Related papers (2024-11-08T21:22:37Z) - Learning from Neighbors: Category Extrapolation for Long-Tail Learning [62.30734737735273]
We offer a novel perspective on long-tail learning, inspired by an observation: datasets with finer granularity tend to be less affected by data imbalance.
We introduce open-set auxiliary classes that are visually similar to existing ones, aiming to enhance representation learning for both head and tail classes.
To prevent the overwhelming presence of auxiliary classes from disrupting training, we introduce a neighbor-silencing loss.
arXiv Detail & Related papers (2024-10-21T13:06:21Z) - Into the LAIONs Den: Investigating Hate in Multimodal Datasets [67.21783778038645]
This paper investigates the effect of scaling datasets on hateful content through a comparative audit of two datasets: LAION-400M and LAION-2B.
We found that hate content increased by nearly 12% with dataset scale, measured both qualitatively and quantitatively.
We also found that filtering dataset contents based on Not Safe For Work (NSFW) values calculated based on images alone does not exclude all the harmful content in alt-text.
arXiv Detail & Related papers (2023-11-06T19:00:05Z) - Towards Harmful Erotic Content Detection through Coreference-Driven
Contextual Analysis [0.0]
This paper introduces a hybrid neural and rule-based context-aware system to identify harmful contextual cues in erotic content.
Our model, tested on Polish text, demonstrates a promising accuracy of 84% and a recall of 80%.
arXiv Detail & Related papers (2023-10-22T15:19:04Z) - Bring Your Own Data! Self-Supervised Evaluation for Large Language
Models [52.15056231665816]
We propose a framework for self-supervised evaluation of Large Language Models (LLMs)
We demonstrate self-supervised evaluation strategies for measuring closed-book knowledge, toxicity, and long-range context dependence.
We find strong correlations between self-supervised and human-supervised evaluations.
arXiv Detail & Related papers (2023-06-23T17:59:09Z) - A Pretrainer's Guide to Training Data: Measuring the Effects of Data
Age, Domain Coverage, Quality, & Toxicity [84.6421260559093]
This study is the largest set of experiments to validate, quantify, and expose undocumented intuitions about text pretraining.
Our findings indicate there does not exist a one-size-fits-all solution to filtering training data.
arXiv Detail & Related papers (2023-05-22T15:57:53Z) - Unsupervised Anomaly Detection for Auditing Data and Impact of
Categorical Encodings [20.37092575427039]
Vehicle Claims dataset consists of fraudulent insurance claims for automotive repairs.
We tackle the common problem of missing benchmark datasets for anomaly detection.
The dataset is evaluated on shallow and deep learning methods.
arXiv Detail & Related papers (2022-10-25T14:33:17Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious.
We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.