Beyond Toxic: Toxicity Detection Datasets are Not Enough for Brand
Safety
- URL: http://arxiv.org/abs/2303.15110v1
- Date: Mon, 27 Mar 2023 11:29:09 GMT
- Title: Beyond Toxic: Toxicity Detection Datasets are Not Enough for Brand
Safety
- Authors: Elizaveta Korotkova, Isaac Kwan Yin Chung
- Abstract summary: Brand safety aims to protect commercial branding by identifying contexts where advertisements should not appear.
We demonstrate the need for building brand safety specific datasets via the application of common toxicity detection datasets.
empirically analyze the effects of weighted sampling strategies in text classification.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The rapid growth in user generated content on social media has resulted in a
significant rise in demand for automated content moderation. Various methods
and frameworks have been proposed for the tasks of hate speech detection and
toxic comment classification. In this work, we combine common datasets to
extend these tasks to brand safety. Brand safety aims to protect commercial
branding by identifying contexts where advertisements should not appear and
covers not only toxicity, but also other potentially harmful content. As these
datasets contain different label sets, we approach the overall problem as a
binary classification task. We demonstrate the need for building brand safety
specific datasets via the application of common toxicity detection datasets to
a subset of brand safety and empirically analyze the effects of weighted
sampling strategies in text classification.
Related papers
- ToxicTAGS: Decoding Toxic Memes with Rich Tag Annotations [3.708799808977489]
We introduce a first-of-its-kind dataset of 6,300 real-world meme-based posts annotated in two stages: (i) binary classification into toxic and normal, and (ii) fine-grained labelling of toxic memes as hateful, dangerous, or offensive.<n>A key feature of this dataset is that it is enriched with auxiliary metadata of socially relevant tags, enhancing the context of each meme.
arXiv Detail & Related papers (2025-08-06T07:46:14Z) - Enhancing Traffic Accident Classifications: Application of NLP Methods for City Safety [41.76653295869846]
We analyze traffic incidents in Munich to identify patterns and characteristics that distinguish different types of accidents.<n>The dataset consists of both structured tabular features, such as location, time, and weather conditions, as well as unstructured free-text descriptions detailing the circumstances of each accident.<n>To assess the reliability of labels, we apply NLP methods, including topic modeling and few-shot learning, which reveal inconsistencies in the labeling process.
arXiv Detail & Related papers (2025-06-11T14:50:49Z) - Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA [0.0]
This dataset removes 7,531 of toxic image-text pairs in the LLaVA pre-training dataset.<n>We offer guidelines for implementing robust toxicity detection pipelines.
arXiv Detail & Related papers (2025-05-09T18:01:50Z) - A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage [77.83757117924995]
We propose a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release.
Our approach shows that seemingly innocuous auxiliary information can be used to infer sensitive attributes like age or substance use history from sanitized data.
arXiv Detail & Related papers (2025-04-28T01:16:27Z) - Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models [61.56740897898055]
We introduce the Silent Branding Attack, a novel data poisoning method that manipulates text-to-image diffusion models.
We find that when certain visual patterns are repeatedly in the training data, the model learns to reproduce them naturally in its outputs.
We develop an automated data poisoning algorithm that unobtrusively injects logos into original images, ensuring they blend naturally and remain undetected.
arXiv Detail & Related papers (2025-03-12T17:21:57Z) - Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation [15.355814393928707]
We put forward a unified dataset tailored for social media content moderation across six sensitive categories.
These include conflictual language, profanity, sexually explicit material, drug-related content, self-harm, and spam.
Fine-tuning large language models on this novel dataset yields significant improvements in detection performance compared to open off-the-shelf models.
arXiv Detail & Related papers (2024-11-29T16:44:02Z) - ToxiCraft: A Novel Framework for Synthetic Generation of Harmful Information [30.333357539780287]
Toxicraft is a novel framework for synthesizing datasets of harmful information.
With only a small amount of seed data, our framework can generate a wide variety of synthetic, yet remarkably realistic, examples of toxic information.
arXiv Detail & Related papers (2024-09-23T06:36:57Z) - ToVo: Toxicity Taxonomy via Voting [25.22398575368979]
We propose a dataset creation mechanism that integrates voting and chain-of-thought processes.
Our methodology ensures diverse classification metrics for each sample.
We utilize the dataset created through our proposed mechanism to train our model.
arXiv Detail & Related papers (2024-06-21T02:35:30Z) - Mimicking User Data: On Mitigating Fine-Tuning Risks in Closed Large Language Models [53.50543146583101]
Fine-tuning large language models on small datasets can enhance their performance on specific downstream tasks.
Malicious actors can subtly manipulate the structure of almost any task-specific dataset to foster significantly more dangerous model behaviors.
We propose a novel mitigation strategy that mixes in safety data which mimics the task format and prompting style of the user data.
arXiv Detail & Related papers (2024-06-12T18:33:11Z) - Named Entity Recognition for Monitoring Plant Health Threats in Tweets:
a ChouBERT Approach [0.0]
ChouBERT is a pre-trained language model that can identify Tweets concerning observations of plant health issues with generalizability on unseen natural hazards.
This paper tackles the lack of labelled data by further studying ChouBERT's know-how on token-level annotation tasks over small labeled sets.
arXiv Detail & Related papers (2023-10-19T06:54:55Z) - Improve Text Classification Accuracy with Intent Information [0.38073142980733]
Existing method does not consider the use of label information, which may weaken the performance of text classification systems in some token-aware scenarios.
We introduce the use of label information as label embedding for the task of text classification and achieve remarkable performance on benchmark dataset.
arXiv Detail & Related papers (2022-12-15T08:15:32Z) - Autoregressive Perturbations for Data Poisoning [54.205200221427994]
Data scraping from social media has led to growing concerns regarding unauthorized use of data.
Data poisoning attacks have been proposed as a bulwark against scraping.
We introduce autoregressive (AR) poisoning, a method that can generate poisoned data without access to the broader dataset.
arXiv Detail & Related papers (2022-06-08T06:24:51Z) - Toxicity Detection can be Sensitive to the Conversational Context [64.28043776806213]
We construct and publicly release a dataset of 10,000 posts with two kinds of toxicity labels.
We introduce a new task, context sensitivity estimation, which aims to identify posts whose perceived toxicity changes if the context is also considered.
arXiv Detail & Related papers (2021-11-19T13:57:26Z) - Learning to Aggregate and Refine Noisy Labels for Visual Sentiment
Analysis [69.48582264712854]
We propose a robust learning method to perform robust visual sentiment analysis.
Our method relies on an external memory to aggregate and filter noisy labels during training.
We establish a benchmark for visual sentiment analysis with label noise using publicly available datasets.
arXiv Detail & Related papers (2021-09-15T18:18:28Z) - Towards Real-World Prohibited Item Detection: A Large-Scale X-ray
Benchmark [53.9819155669618]
This paper presents a large-scale dataset, named as PIDray, which covers various cases in real-world scenarios for prohibited item detection.
With an intensive amount of effort, our dataset contains $12$ categories of prohibited items in $47,677$ X-ray images with high-quality annotated segmentation masks and bounding boxes.
The proposed method performs favorably against the state-of-the-art methods, especially for detecting the deliberately hidden items.
arXiv Detail & Related papers (2021-08-16T11:14:16Z) - Incorporating Label Uncertainty in Understanding Adversarial Robustness [17.65850501514483]
We show that error regions induced by state-of-the-art models tend to have much higher label uncertainty compared with randomly-selected subsets.
This observation motivates us to adapt a concentration estimation algorithm to account for label uncertainty.
arXiv Detail & Related papers (2021-07-07T14:26:57Z) - Automatic Validation of Textual Attribute Values in E-commerce Catalog
by Learning with Limited Labeled Data [61.789797281676606]
We propose a novel meta-learning latent variable approach, called MetaBridge.
It can learn transferable knowledge from a subset of categories with limited labeled data.
It can capture the uncertainty of never-seen categories with unlabeled data.
arXiv Detail & Related papers (2020-06-15T21:31:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.