Related papers: Beyond Toxic: Toxicity Detection Datasets are Not Enough for Brand Safety

Related papers

ToxicTAGS: Decoding Toxic Memes with Rich Tag Annotations [3.708799808977489]
We introduce a first-of-its-kind dataset of 6,300 real-world meme-based posts annotated in two stages: (i) binary classification into toxic and normal, and (ii) fine-grained labelling of toxic memes as hateful, dangerous, or offensive.<n>A key feature of this dataset is that it is enriched with auxiliary metadata of socially relevant tags, enhancing the context of each meme.
arXiv Detail & Related papers (2025-08-06T07:46:14Z)
Enhancing Traffic Accident Classifications: Application of NLP Methods for City Safety [41.76653295869846]
We analyze traffic incidents in Munich to identify patterns and characteristics that distinguish different types of accidents.<n>The dataset consists of both structured tabular features, such as location, time, and weather conditions, as well as unstructured free-text descriptions detailing the circumstances of each accident.<n>To assess the reliability of labels, we apply NLP methods, including topic modeling and few-shot learning, which reveal inconsistencies in the labeling process.
arXiv Detail & Related papers (2025-06-11T14:50:49Z)
Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA [0.0]
This dataset removes 7,531 of toxic image-text pairs in the LLaVA pre-training dataset.<n>We offer guidelines for implementing robust toxicity detection pipelines.
arXiv Detail & Related papers (2025-05-09T18:01:50Z)
A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage [77.83757117924995]
We propose a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release. Our approach shows that seemingly innocuous auxiliary information can be used to infer sensitive attributes like age or substance use history from sanitized data.
arXiv Detail & Related papers (2025-04-28T01:16:27Z)
Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models [61.56740897898055]
We introduce the Silent Branding Attack, a novel data poisoning method that manipulates text-to-image diffusion models. We find that when certain visual patterns are repeatedly in the training data, the model learns to reproduce them naturally in its outputs. We develop an automated data poisoning algorithm that unobtrusively injects logos into original images, ensuring they blend naturally and remain undetected.
arXiv Detail & Related papers (2025-03-12T17:21:57Z)
Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation [15.355814393928707]
We put forward a unified dataset tailored for social media content moderation across six sensitive categories. These include conflictual language, profanity, sexually explicit material, drug-related content, self-harm, and spam. Fine-tuning large language models on this novel dataset yields significant improvements in detection performance compared to open off-the-shelf models.
arXiv Detail & Related papers (2024-11-29T16:44:02Z)
ToxiCraft: A Novel Framework for Synthetic Generation of Harmful Information [30.333357539780287]
Toxicraft is a novel framework for synthesizing datasets of harmful information. With only a small amount of seed data, our framework can generate a wide variety of synthetic, yet remarkably realistic, examples of toxic information.
arXiv Detail & Related papers (2024-09-23T06:36:57Z)
ToVo: Toxicity Taxonomy via Voting [25.22398575368979]
We propose a dataset creation mechanism that integrates voting and chain-of-thought processes. Our methodology ensures diverse classification metrics for each sample. We utilize the dataset created through our proposed mechanism to train our model.
arXiv Detail & Related papers (2024-06-21T02:35:30Z)
Mimicking User Data: On Mitigating Fine-Tuning Risks in Closed Large Language Models [53.50543146583101]
Fine-tuning large language models on small datasets can enhance their performance on specific downstream tasks. Malicious actors can subtly manipulate the structure of almost any task-specific dataset to foster significantly more dangerous model behaviors. We propose a novel mitigation strategy that mixes in safety data which mimics the task format and prompting style of the user data.
arXiv Detail & Related papers (2024-06-12T18:33:11Z)
Named Entity Recognition for Monitoring Plant Health Threats in Tweets: a ChouBERT Approach [0.0]
ChouBERT is a pre-trained language model that can identify Tweets concerning observations of plant health issues with generalizability on unseen natural hazards. This paper tackles the lack of labelled data by further studying ChouBERT's know-how on token-level annotation tasks over small labeled sets.
arXiv Detail & Related papers (2023-10-19T06:54:55Z)
Improve Text Classification Accuracy with Intent Information [0.38073142980733]
Existing method does not consider the use of label information, which may weaken the performance of text classification systems in some token-aware scenarios. We introduce the use of label information as label embedding for the task of text classification and achieve remarkable performance on benchmark dataset.
arXiv Detail & Related papers (2022-12-15T08:15:32Z)
Autoregressive Perturbations for Data Poisoning [54.205200221427994]
Data scraping from social media has led to growing concerns regarding unauthorized use of data. Data poisoning attacks have been proposed as a bulwark against scraping. We introduce autoregressive (AR) poisoning, a method that can generate poisoned data without access to the broader dataset.
arXiv Detail & Related papers (2022-06-08T06:24:51Z)
Toxicity Detection can be Sensitive to the Conversational Context [64.28043776806213]
We construct and publicly release a dataset of 10,000 posts with two kinds of toxicity labels. We introduce a new task, context sensitivity estimation, which aims to identify posts whose perceived toxicity changes if the context is also considered.
arXiv Detail & Related papers (2021-11-19T13:57:26Z)
Learning to Aggregate and Refine Noisy Labels for Visual Sentiment Analysis [69.48582264712854]
We propose a robust learning method to perform robust visual sentiment analysis. Our method relies on an external memory to aggregate and filter noisy labels during training. We establish a benchmark for visual sentiment analysis with label noise using publicly available datasets.
arXiv Detail & Related papers (2021-09-15T18:18:28Z)
Towards Real-World Prohibited Item Detection: A Large-Scale X-ray Benchmark [53.9819155669618]
This paper presents a large-scale dataset, named as PIDray, which covers various cases in real-world scenarios for prohibited item detection. With an intensive amount of effort, our dataset contains $12$ categories of prohibited items in $47,677$ X-ray images with high-quality annotated segmentation masks and bounding boxes. The proposed method performs favorably against the state-of-the-art methods, especially for detecting the deliberately hidden items.
arXiv Detail & Related papers (2021-08-16T11:14:16Z)
Incorporating Label Uncertainty in Understanding Adversarial Robustness [17.65850501514483]
We show that error regions induced by state-of-the-art models tend to have much higher label uncertainty compared with randomly-selected subsets. This observation motivates us to adapt a concentration estimation algorithm to account for label uncertainty.
arXiv Detail & Related papers (2021-07-07T14:26:57Z)
Automatic Validation of Textual Attribute Values in E-commerce Catalog by Learning with Limited Labeled Data [61.789797281676606]
We propose a novel meta-learning latent variable approach, called MetaBridge. It can learn transferable knowledge from a subset of categories with limited labeled data. It can capture the uncertainty of never-seen categories with unlabeled data.
arXiv Detail & Related papers (2020-06-15T21:31:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.