ArMeme: Propagandistic Content in Arabic Memes
- URL: http://arxiv.org/abs/2406.03916v2
- Date: Sun, 06 Oct 2024 08:35:10 GMT
- Title: ArMeme: Propagandistic Content in Arabic Memes
- Authors: Firoj Alam, Abul Hasnat, Fatema Ahmed, Md Arid Hasan, Maram Hasanain,
- Abstract summary: We develop an Arabic memes dataset with manual annotations of propagandistic content.
We provide a comprehensive analysis aiming to develop computational tools for their detection.
- Score: 9.48177009736915
- License:
- Abstract: With the rise of digital communication, memes have become a significant medium for cultural and political expression that is often used to mislead audiences. Identification of such misleading and persuasive multimodal content has become more important among various stakeholders, including social media platforms, policymakers, and the broader society as they often cause harm to individuals, organizations, and/or society. While there has been effort to develop AI-based automatic systems for resource-rich languages (e.g., English), it is relatively little to none for medium to low resource languages. In this study, we focused on developing an Arabic memes dataset with manual annotations of propagandistic content. We annotated ~6K Arabic memes collected from various social media platforms, which is a first resource for Arabic multimodal research. We provide a comprehensive analysis aiming to develop computational tools for their detection. We will make them publicly available for the community.
Related papers
- MultiProSE: A Multi-label Arabic Dataset for Propaganda, Sentiment, and Emotion Detection [0.0]
This dataset comprises 8,000 annotated news articles, which is the largest propaganda dataset to date.
For each task, several baselines have been developed using large language models (LLMs), such as GPT-4o-mini, and pre-trained language models (PLMs)
The dataset, annotation guidelines, and source code are all publicly released to facilitate future research and development in Arabic language models.
arXiv Detail & Related papers (2025-02-12T11:35:20Z) - AIN: The Arabic INclusive Large Multimodal Model [71.29419186696138]
AIN is an English-Arabic bilingual LMM designed to excel in English and Arabic.
AIN demonstrates state-of-the-art Arabic performance, while also possessing strong English-language visual capabilities.
AIN's superior capabilities position it as a significant step toward empowering Arabic speakers with advanced multimodal generative AI tools.
arXiv Detail & Related papers (2025-01-31T18:58:20Z) - RoMemes: A multimodal meme corpus for the Romanian language [39.58317527488534]
We introduce a curated dataset of real memes in the Romanian language, with multiple annotation levels.
Results indicate that further research is needed to improve the processing capabilities of AI tools when faced with Internet memes.
arXiv Detail & Related papers (2024-10-20T20:26:53Z) - Propaganda to Hate: A Multimodal Analysis of Arabic Memes with Multi-Agent LLMs [7.217569932870683]
This study explores the intersection between propaganda and hate in memes.
We extend the propagandistic meme dataset with coarse and fine-grained hate labels.
Our finding suggests that there is an association between propaganda and hate in memes.
arXiv Detail & Related papers (2024-09-11T13:04:34Z) - Arabic Text Sentiment Analysis: Reinforcing Human-Performed Surveys with
Wider Topic Analysis [49.1574468325115]
The in-depth study manually analyses 133 ASA papers published in the English language between 2002 and 2020.
The main findings show the different approaches used for ASA: machine learning, lexicon-based and hybrid approaches.
There is a need to develop ASA tools that can be used in industry, as well as in academia, for Arabic text SA.
arXiv Detail & Related papers (2024-03-04T10:37:48Z) - Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking [48.21982147529661]
This paper introduces a novel approach for massively multicultural knowledge acquisition.
Our method strategically navigates from densely informative Wikipedia documents on cultural topics to an extensive network of linked pages.
Our work marks an important step towards deeper understanding and bridging the gaps of cultural disparities in AI.
arXiv Detail & Related papers (2024-02-14T18:16:54Z) - Detecting and Understanding Harmful Memes: A Survey [48.135415967633676]
We offer a comprehensive survey with a focus on harmful memes.
One interesting finding is that many types of harmful memes are not really studied, e.g., such featuring self-harm and extremism.
Another observation is that memes can propagate globally through repackaging in different languages and that they can also be multilingual.
arXiv Detail & Related papers (2022-05-09T13:43:27Z) - 3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social
Media Short Videos [72.69052180249598]
We present 3MASSIV, a multilingual, multimodal and multi-aspect, expertly-annotated dataset of diverse short videos extracted from short-video social media platform - Moj.
3MASSIV comprises of 50k short videos (20 seconds average duration) and 100K unlabeled videos in 11 different languages.
We show how the social media content in 3MASSIV is dynamic and temporal in nature, which can be used for semantic understanding tasks and cross-lingual analysis.
arXiv Detail & Related papers (2022-03-28T02:47:01Z) - M2H2: A Multimodal Multiparty Hindi Dataset For Humor Recognition in
Conversations [72.81164101048181]
We propose a dataset for Multimodal Multiparty Hindi Humor (M2H2) recognition in conversations containing 6,191 utterances from 13 episodes of a very popular TV series "Shrimaan Shrimati Phir Se"
Each utterance is annotated with humor/non-humor labels and encompasses acoustic, visual, and textual modalities.
The empirical results on M2H2 dataset demonstrate that multimodal information complements unimodal information for humor recognition.
arXiv Detail & Related papers (2021-08-03T02:54:09Z) - Semi-automatic Generation of Multilingual Datasets for Stance Detection
in Twitter [9.359018642178917]
This paper presents a method to obtain multilingual datasets for stance detection in Twitter.
We leverage user-based information to semi-automatically label large amounts of tweets.
arXiv Detail & Related papers (2021-01-28T13:05:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.