Related papers: ArMeme: Propagandistic Content in Arabic Memes

ArMeme: Propagandistic Content in Arabic Memes

URL: http://arxiv.org/abs/2406.03916v1
Date: Thu, 6 Jun 2024 09:56:49 GMT
Title: ArMeme: Propagandistic Content in Arabic Memes
Authors: Firoj Alam, Abul Hasnat, Fatema Ahmed, Md Arid Hasan, Maram Hasanain,
Abstract summary: We develop an Arabic memes dataset with manual annotations of propagandistic content. We provide a comprehensive analysis aiming to develop computational tools for their detection.
Score: 9.48177009736915
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: With the rise of digital communication, memes have become a significant medium for cultural and political expression that is often used to mislead audiences. Identification of such misleading and persuasive multimodal content has become more important among various stakeholders, including social media platforms, policymakers, and the broader society as they often cause harm to individuals, organizations, and/or society. While there has been effort to develop AI-based automatic systems for resource-rich languages (e.g., English), it is relatively little to none for medium to low resource languages. In this study, we focused on developing an Arabic memes dataset with manual annotations of propagandistic content. We annotated ~6K Arabic memes collected from various social media platforms, which is a first resource for Arabic multimodal research. We provide a comprehensive analysis aiming to develop computational tools for their detection. We will make them publicly available for the community.

Related papers

MultiProSE: A Multi-label Arabic Dataset for Propaganda, Sentiment, and Emotion Detection [0.0]
This dataset comprises 8,000 annotated news articles, which is the largest propaganda dataset to date. For each task, several baselines have been developed using large language models (LLMs), such as GPT-4o-mini, and pre-trained language models (PLMs) The dataset, annotation guidelines, and source code are all publicly released to facilitate future research and development in Arabic language models.
arXiv Detail & Related papers (2025-02-12T11:35:20Z)
AIN: The Arabic INclusive Large Multimodal Model [71.29419186696138]
AIN is an English-Arabic bilingual LMM designed to excel in English and Arabic. AIN demonstrates state-of-the-art Arabic performance, while also possessing strong English-language visual capabilities. AIN's superior capabilities position it as a significant step toward empowering Arabic speakers with advanced multimodal generative AI tools.
arXiv Detail & Related papers (2025-01-31T18:58:20Z)
RoMemes: A multimodal meme corpus for the Romanian language [39.58317527488534]
We introduce a curated dataset of real memes in the Romanian language, with multiple annotation levels. Results indicate that further research is needed to improve the processing capabilities of AI tools when faced with Internet memes.
arXiv Detail & Related papers (2024-10-20T20:26:53Z)
Propaganda to Hate: A Multimodal Analysis of Arabic Memes with Multi-Agent LLMs [7.217569932870683]
This study explores the intersection between propaganda and hate in memes. We extend the propagandistic meme dataset with coarse and fine-grained hate labels. Our finding suggests that there is an association between propaganda and hate in memes.
arXiv Detail & Related papers (2024-09-11T13:04:34Z)
Arabic Text Sentiment Analysis: Reinforcing Human-Performed Surveys with Wider Topic Analysis [49.1574468325115]
The in-depth study manually analyses 133 ASA papers published in the English language between 2002 and 2020. The main findings show the different approaches used for ASA: machine learning, lexicon-based and hybrid approaches. There is a need to develop ASA tools that can be used in industry, as well as in academia, for Arabic text SA.
arXiv Detail & Related papers (2024-03-04T10:37:48Z)
Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking [48.21982147529661]
This paper introduces a novel approach for massively multicultural knowledge acquisition. Our method strategically navigates from densely informative Wikipedia documents on cultural topics to an extensive network of linked pages. Our work marks an important step towards deeper understanding and bridging the gaps of cultural disparities in AI.
arXiv Detail & Related papers (2024-02-14T18:16:54Z)
AceGPT, Localizing Large Language Models in Arabic [73.39989503874634]
The paper proposes a comprehensive solution that includes pre-training with Arabic texts, Supervised Fine-Tuning (SFT) utilizing native Arabic instructions, and GPT-4 responses in Arabic. The goal is to cultivate culturally cognizant and value-aligned Arabic LLMs capable of accommodating the diverse, application-specific needs of Arabic-speaking communities.
arXiv Detail & Related papers (2023-09-21T13:20:13Z)
Detecting and Understanding Harmful Memes: A Survey [48.135415967633676]
We offer a comprehensive survey with a focus on harmful memes. One interesting finding is that many types of harmful memes are not really studied, e.g., such featuring self-harm and extremism. Another observation is that memes can propagate globally through repackaging in different languages and that they can also be multilingual.
arXiv Detail & Related papers (2022-05-09T13:43:27Z)
3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos [72.69052180249598]
We present 3MASSIV, a multilingual, multimodal and multi-aspect, expertly-annotated dataset of diverse short videos extracted from short-video social media platform - Moj. 3MASSIV comprises of 50k short videos (20 seconds average duration) and 100K unlabeled videos in 11 different languages. We show how the social media content in 3MASSIV is dynamic and temporal in nature, which can be used for semantic understanding tasks and cross-lingual analysis.
arXiv Detail & Related papers (2022-03-28T02:47:01Z)
M2H2: A Multimodal Multiparty Hindi Dataset For Humor Recognition in Conversations [72.81164101048181]
We propose a dataset for Multimodal Multiparty Hindi Humor (M2H2) recognition in conversations containing 6,191 utterances from 13 episodes of a very popular TV series "Shrimaan Shrimati Phir Se" Each utterance is annotated with humor/non-humor labels and encompasses acoustic, visual, and textual modalities. The empirical results on M2H2 dataset demonstrate that multimodal information complements unimodal information for humor recognition.
arXiv Detail & Related papers (2021-08-03T02:54:09Z)
Semi-automatic Generation of Multilingual Datasets for Stance Detection in Twitter [9.359018642178917]
This paper presents a method to obtain multilingual datasets for stance detection in Twitter. We leverage user-based information to semi-automatically label large amounts of tweets.
arXiv Detail & Related papers (2021-01-28T13:05:09Z)
ARAACOM: ARAbic Algerian Corpus for Opinion Mining [0.0]
Opinion mining in the web becomes more and more an attracting task. In this paper, we propose our approach, for opinion mining in Arabic Algerian news paper.
arXiv Detail & Related papers (2020-01-22T13:45:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.