Related papers: ToxicTAGS: Decoding Toxic Memes with Rich Tag Annotations

ToxicTAGS: Decoding Toxic Memes with Rich Tag Annotations

URL: http://arxiv.org/abs/2508.04166v1
Date: Wed, 06 Aug 2025 07:46:14 GMT
Title: ToxicTAGS: Decoding Toxic Memes with Rich Tag Annotations
Authors: Subhankar Swain, Naquee Rizwan, Nayandeep Deb, Vishwajeet Singh Solanki, Vishwa Gangadhar S, Animesh Mukherjee,
Abstract summary: We introduce a first-of-its-kind dataset of 6,300 real-world meme-based posts annotated in two stages: (i) binary classification into toxic and normal, and (ii) fine-grained labelling of toxic memes as hateful, dangerous, or offensive.<n>A key feature of this dataset is that it is enriched with auxiliary metadata of socially relevant tags, enhancing the context of each meme.
Score: 3.708799808977489
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The 2025 Global Risks Report identifies state-based armed conflict and societal polarisation among the most pressing global threats, with social media playing a central role in amplifying toxic discourse. Memes, as a widely used mode of online communication, often serve as vehicles for spreading harmful content. However, limitations in data accessibility and the high cost of dataset curation hinder the development of robust meme moderation systems. To address this challenge, in this work, we introduce a first-of-its-kind dataset of 6,300 real-world meme-based posts annotated in two stages: (i) binary classification into toxic and normal, and (ii) fine-grained labelling of toxic memes as hateful, dangerous, or offensive. A key feature of this dataset is that it is enriched with auxiliary metadata of socially relevant tags, enhancing the context of each meme. In addition, we propose a tag generation module that produces socially grounded tags, because most in-the-wild memes often do not come with tags. Experimental results show that incorporating these tags substantially enhances the performance of state-of-the-art VLMs detection tasks. Our contributions offer a novel and scalable foundation for improved content moderation in multimodal online environments.

Related papers

MemeMind: A Large-Scale Multimodal Dataset with Chain-of-Thought Reasoning for Harmful Meme Detection [4.09109557328609]
Harmful memes pose significant challenges for automated detection due to implicit semantics and complex multimodal interactions.<n>MemeMind is a novel dataset featuring scientifically rigorous standards, large scale, diversity, bilingual support (Chinese and English), and detailed Chain-of-Thought (CoT) annotations.<n>We propose an innovative detection framework, MemeGuard, which effectively integrates multimodal information with reasoning process modeling.
arXiv Detail & Related papers (2025-06-15T13:45:30Z)
ShieldVLM: Safeguarding the Multimodal Implicit Toxicity via Deliberative Reasoning with LVLMs [72.8646625127485]
Multimodal implicit toxicity appears not only as formal statements in social platforms but also prompts that can lead to toxic dialogs.<n>Despite the success in unimodal text or image moderation, toxicity detection for multimodal content, particularly the multimodal implicit toxicity, remains underexplored.<n>To advance the detection of multimodal implicit toxicity, we build ShieldVLM, a model which identifies implicit toxicity in multimodal statements, prompts and dialogs via deliberative cross-modal reasoning.
arXiv Detail & Related papers (2025-05-20T07:31:17Z)
MemeSense: An Adaptive In-Context Framework for Social Commonsense Driven Meme Moderation [3.763944391065958]
We introduce MemeSense, an adaptive in-context learning framework that fuses social commonsense reasoning with visual and semantically related reference examples.<n>MemeSense effectively balances lexical, visual, and ethical considerations, enabling precise yet context-aware meme intervention.
arXiv Detail & Related papers (2025-02-16T19:46:24Z)
MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention [43.849634264271565]
We present textitMemeGuard, a comprehensive framework leveraging Large Language Models (LLMs) and Visual Language Models (VLMs) for meme intervention. textitMemeGuard harnesses a specially fine-tuned VLM, textitVLMeme, for meme interpretation, and a multimodal knowledge selection and ranking mechanism. We leverage textitICMM to test textitMemeGuard, demonstrating its proficiency in generating relevant and effective responses to toxic memes.
arXiv Detail & Related papers (2024-06-08T04:09:20Z)
Into the LAIONs Den: Investigating Hate in Multimodal Datasets [67.21783778038645]
This paper investigates the effect of scaling datasets on hateful content through a comparative audit of two datasets: LAION-400M and LAION-2B. We found that hate content increased by nearly 12% with dataset scale, measured both qualitatively and quantitatively. We also found that filtering dataset contents based on Not Safe For Work (NSFW) values calculated based on images alone does not exclude all the harmful content in alt-text.
arXiv Detail & Related papers (2023-11-06T19:00:05Z)
Beyond Toxic: Toxicity Detection Datasets are Not Enough for Brand Safety [0.0]
Brand safety aims to protect commercial branding by identifying contexts where advertisements should not appear. We demonstrate the need for building brand safety specific datasets via the application of common toxicity detection datasets. empirically analyze the effects of weighted sampling strategies in text classification.
arXiv Detail & Related papers (2023-03-27T11:29:09Z)
Micro-video Tagging via Jointly Modeling Social Influence and Tag Relation [56.23157334014773]
85.7% of micro-videos lack annotation. Existing methods mostly focus on analyzing video content, neglecting users' social influence and tag relation. We formulate micro-video tagging as a link prediction problem in a constructed heterogeneous network.
arXiv Detail & Related papers (2023-03-15T02:13:34Z)
Countering Malicious Content Moderation Evasion in Online Social Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems. This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z)
DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally Spreading Out Disinformation [72.18912216025029]
We present DisinfoMeme to help detect disinformation memes. The dataset contains memes mined from Reddit covering three current topics: the COVID-19 pandemic, the Black Lives Matter movement, and veganism/vegetarianism.
arXiv Detail & Related papers (2022-05-25T09:54:59Z)
Detecting and Understanding Harmful Memes: A Survey [48.135415967633676]
We offer a comprehensive survey with a focus on harmful memes. One interesting finding is that many types of harmful memes are not really studied, e.g., such featuring self-harm and extremism. Another observation is that memes can propagate globally through repackaging in different languages and that they can also be multilingual.
arXiv Detail & Related papers (2022-05-09T13:43:27Z)
Understanding and Detecting Dangerous Speech in Social Media [9.904746542801837]
Dangerous language such as physical threats in online environments is a somewhat rare, yet remains highly important. We build a labeled dataset for dangerous speech and develop highly effective models to detect dangerous content. Our best model performs at 59.60% macro F1, significantly outperforming a competitive baseline.
arXiv Detail & Related papers (2020-05-04T09:42:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.