Related papers: Benchmark dataset of memes with text transcriptions for automatic detection of multi-modal misogynistic content

Benchmark dataset of memes with text transcriptions for automatic detection of multi-modal misogynistic content

URL: http://arxiv.org/abs/2106.08409v1
Date: Tue, 15 Jun 2021 20:01:28 GMT
Title: Benchmark dataset of memes with text transcriptions for automatic detection of multi-modal misogynistic content
Authors: Francesca Gasparini, Giulia Rizzi, Aurora Saibene, Elisabetta Fersini
Abstract summary: dataset is composed of 800 memes collected from the most popular social media platforms. Experts have selected a dataset of 800 memes equally balanced between misogynistic and non-misogynistic ones. This data can be used to approach the problem of automatic detection of misogynistic content on the Web.
Score: 0.8261182037130405
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In this paper we present a benchmark dataset generated as part of a project for automatic identification of misogyny within online content, which focuses in particular on memes. The benchmark here described is composed of 800 memes collected from the most popular social media platforms, such as Facebook, Twitter, Instagram and Reddit, and consulting websites dedicated to collection and creation of memes. To gather misogynistic memes, specific keywords that refer to misogynistic content have been considered as search criterion, considering different manifestations of hatred against women, such as body shaming, stereotyping, objectification and violence. In parallel, memes with no misogynist content have been manually downloaded from the same web sources. Among all the collected memes, three domain experts have selected a dataset of 800 memes equally balanced between misogynistic and non-misogynistic ones. This dataset has been validated through a crowdsourcing platform, involving 60 subjects for the labelling process, in order to collect three evaluations for each instance. Two further binary labels have been collected from both the experts and the crowdsourcing platform, for memes evaluated as misogynistic, concerning aggressiveness and irony. Finally for each meme, the text has been manually transcribed. The dataset provided is thus composed of the 800 memes, the labels given by the experts and those obtained by the crowdsourcing validation, and the transcribed texts. This data can be used to approach the problem of automatic detection of misogynistic content on the Web relying on both textual and visual cues, facing phenomenons that are growing every day such as cybersexism and technology-facilitated violence.

Related papers

What is Beneath Misogyny: Misogynous Memes Classification and Explanation [20.78432772119578]
We introduce a novel approach to detect, categorize, and explain misogynistic content in memes.<n>textitnamely, textittextbfMM-Misogyny processes text and image modalities separately.<n>The model not only detects and classifies misogyny, but also provides a granular understanding of how misogyny operates in domains of life.
arXiv Detail & Related papers (2025-07-30T14:38:53Z)
MemeReaCon: Probing Contextual Meme Understanding in Large Vision-Language Models [50.2355423914562]
We introduce MemeReaCon, a novel benchmark designed to evaluate how Large Vision Language Models (LVLMs) understand memes in their original context.<n>We collected memes from five different Reddit communities, keeping each meme's image, the post text, and user comments together.<n>Our tests with leading LVLMs show a clear weakness: models either fail to interpret critical information in the contexts, or overly focus on visual details while overlooking communicative purpose.
arXiv Detail & Related papers (2025-05-23T03:27:23Z)
MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing [53.30190591805432]
We introduce MemeMQA, a multimodal question-answering framework to solicit accurate responses to structured questions. We also propose ARSENAL, a novel two-stage multimodal framework to address MemeMQA.
arXiv Detail & Related papers (2024-05-18T07:44:41Z)
Exploratory Data Analysis on Code-mixed Misogynistic Comments [0.0]
We present a novel dataset of YouTube comments in mix-code Hinglish. These comments have been weak labelled as Misogynistic' and Non-misogynistic'
arXiv Detail & Related papers (2024-03-09T23:21:17Z)
Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes Through Multimodal Explanations [48.82168723932981]
We introduce em MultiBully-Ex, the first benchmark dataset for multimodal explanation from code-mixed cyberbullying memes. A Contrastive Language-Image Pretraining (CLIP) approach has been proposed for visual and textual explanation of a meme.
arXiv Detail & Related papers (2024-01-18T11:24:30Z)
A Template Is All You Meme [83.05919383106715]
We release a knowledge base of memes and information found on www.knowyourmeme.com, composed of more than 54,000 images. We hypothesize that meme templates can be used to inject models with the context missing from previous approaches.
arXiv Detail & Related papers (2023-11-11T19:38:14Z)
DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally Spreading Out Disinformation [72.18912216025029]
We present DisinfoMeme to help detect disinformation memes. The dataset contains memes mined from Reddit covering three current topics: the COVID-19 pandemic, the Black Lives Matter movement, and veganism/vegetarianism.
arXiv Detail & Related papers (2022-05-25T09:54:59Z)
DISARM: Detecting the Victims Targeted by Harmful Memes [49.12165815990115]
DISARM is a framework that uses named entity recognition and person identification to detect harmful memes. We show that DISARM significantly outperforms ten unimodal and multimodal systems. It can reduce the relative error rate for harmful target identification by up to 9 points absolute over several strong multimodal rivals.
arXiv Detail & Related papers (2022-05-11T19:14:26Z)
Detecting and Understanding Harmful Memes: A Survey [48.135415967633676]
We offer a comprehensive survey with a focus on harmful memes. One interesting finding is that many types of harmful memes are not really studied, e.g., such featuring self-harm and extremism. Another observation is that memes can propagate globally through repackaging in different languages and that they can also be multilingual.
arXiv Detail & Related papers (2022-05-09T13:43:27Z)
TIB-VA at SemEval-2022 Task 5: A Multimodal Architecture for the Detection and Classification of Misogynous Memes [9.66022279280394]
We present a multimodal architecture that combines textual and visual features in order to detect misogynous meme content. Our solution obtained the best result in the Task-B where the challenge is to classify whether a given document is misogynous.
arXiv Detail & Related papers (2022-04-13T11:03:21Z)
Detecting Harmful Memes and Their Targets [27.25262711136056]
We present HarMeme, the first benchmark dataset, containing 3,544 memes related to COVID-19. In the first stage, we labeled a meme as very harmful, partially harmful, or harmless; in the second stage, we further annotated the type of target(s) that each harmful meme points to. The evaluation results using ten unimodal and multimodal models highlight the importance of using multimodal signals for both tasks.
arXiv Detail & Related papers (2021-09-24T17:11:42Z)
TrollsWithOpinion: A Dataset for Predicting Domain-specific Opinion Manipulation in Troll Memes [4.513166202592557]
We classify 8,881 IWT or multimodal memes in the English language (TrollsWith dataset) These memes have the potential to demean, harras, or bully targeted individuals. We perform baseline experiments on the annotated dataset, and our result shows that existing state-of-the-art techniques could only reach a weighted-average F1-score of 0.37.
arXiv Detail & Related papers (2021-09-08T12:12:13Z)
Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset [47.65948529524281]
We collect hateful and non-hateful memes from Pinterest to evaluate out-of-sample performance on models pre-trained on the Facebook dataset. We find that memes in the wild differ in two key aspects: 1) Captions must be extracted via OCR, and 2) Memes are more diverse than traditional memes', including screenshots of conversations or text on a plain background.
arXiv Detail & Related papers (2021-07-09T09:04:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.