Related papers: Robust Harmful Meme Detection under Missing Modalities via Shared Representation Learning

Robust Harmful Meme Detection under Missing Modalities via Shared Representation Learning

URL: http://arxiv.org/abs/2602.01101v1
Date: Sun, 01 Feb 2026 08:52:15 GMT
Title: Robust Harmful Meme Detection under Missing Modalities via Shared Representation Learning
Authors: Felix Breiteneder, Mohammad Belal, Muhammad Saad Saeed, Shahed Masoudian, Usman Naseem, Kulshrestha Juhi, Markus Schedl, Shah Nawaz,
Abstract summary: We present the first-of-its-kind work to investigate the behavior of harmful meme detection methods in the presence of modal-incomplete data.<n>Specifically, we propose a new baseline method that learns a shared representation for multiple modalities by projecting them independently.<n> Experimental results on two benchmark datasets demonstrate that our method outperforms existing approaches when text is missing.
Score: 19.283838516077196
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Internet memes are powerful tools for communication, capable of spreading political, psychological, and sociocultural ideas. However, they can be harmful and can be used to disseminate hate toward targeted individuals or groups. Although previous studies have focused on designing new detection methods, these often rely on modal-complete data, such as text and images. In real-world settings, however, modalities like text may be missing due to issues like poor OCR quality, making existing methods sensitive to missing information and leading to performance deterioration. To address this gap, in this paper, we present the first-of-its-kind work to comprehensively investigate the behavior of harmful meme detection methods in the presence of modal-incomplete data. Specifically, we propose a new baseline method that learns a shared representation for multiple modalities by projecting them independently. These shared representations can then be leveraged when data is modal-incomplete. Experimental results on two benchmark datasets demonstrate that our method outperforms existing approaches when text is missing. Moreover, these results suggest that our method allows for better integration of visual features, reducing dependence on text and improving robustness in scenarios where textual information is missing. Our work represents a significant step forward in enabling the real-world application of harmful meme detection, particularly in situations where a modality is absent.

Related papers

Beyond Artificial Misalignment: Detecting and Grounding Semantic-Coordinated Multimodal Manipulations [56.816929931908824]
We pioneer the detection of semantically-coordinated manipulations in multimodal data.<n>We propose a Retrieval-Augmented Manipulation Detection and Grounding (RamDG) framework.<n>Our framework significantly outperforms existing methods, achieving 2.06% higher detection accuracy on SAMM compared to state-of-the-art approaches.
arXiv Detail & Related papers (2025-09-16T04:18:48Z)
MemeMind: A Large-Scale Multimodal Dataset with Chain-of-Thought Reasoning for Harmful Meme Detection [4.09109557328609]
Harmful memes pose significant challenges for automated detection due to implicit semantics and complex multimodal interactions.<n>MemeMind is a novel dataset featuring scientifically rigorous standards, large scale, diversity, bilingual support (Chinese and English), and detailed Chain-of-Thought (CoT) annotations.<n>We propose an innovative detection framework, MemeGuard, which effectively integrates multimodal information with reasoning process modeling.
arXiv Detail & Related papers (2025-06-15T13:45:30Z)
Towards a Robust Framework for Multimodal Hate Detection: A Study on Video vs. Image-based Content [7.5253808885104325]
Social media platforms enable the propagation of hateful content across different modalities.<n>Recent approaches have shown promise in handling individual modalities, but their effectiveness across different modality combinations remains unexplored.<n>This paper presents a systematic analysis of fusion-based approaches for multimodal hate detection, focusing on their performance across video and image-based content.
arXiv Detail & Related papers (2025-02-11T00:07:40Z)
A Discrepancy Aware Framework for Robust Anomaly Detection [51.710249807397695]
We present a Discrepancy Aware Framework (DAF), which demonstrates robust performance consistently with simple and cheap strategies. Our method leverages an appearance-agnostic cue to guide the decoder in identifying defects, thereby alleviating its reliance on synthetic appearance. Under the simple synthesis strategies, it outperforms existing methods by a large margin. Furthermore, it also achieves the state-of-the-art localization performance.
arXiv Detail & Related papers (2023-10-11T15:21:40Z)
Harnessing the Power of Text-image Contrastive Models for Automatic Detection of Online Misinformation [50.46219766161111]
We develop a self-learning model to explore the constrastive learning in the domain of misinformation identification. Our model shows the superior performance of non-matched image-text pair detection when the training data is insufficient.
arXiv Detail & Related papers (2023-04-19T02:53:59Z)
CoCo: Coherence-Enhanced Machine-Generated Text Detection Under Data Limitation With Contrastive Learning [14.637303913878435]
We present a coherence-based contrastive learning model named CoCo to detect the possible MGT under low-resource scenario. To exploit the linguistic feature, we encode coherence information in form of graph into text representation. Experiment results on two public datasets and two self-constructed datasets prove our approach outperforms the state-of-art methods significantly.
arXiv Detail & Related papers (2022-12-20T15:26:19Z)
On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification. We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned. Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z)
Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News [57.9843300852526]
We introduce the more realistic and challenging task of defending against machine-generated news that also includes images and captions. To identify the possible weaknesses that adversaries can exploit, we create a NeuralNews dataset composed of 4 different types of generated articles. In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies.
arXiv Detail & Related papers (2020-09-16T14:13:15Z)
Analysis of Social Media Data using Multimodal Deep Learning for Disaster Response [6.8889797054846795]
We propose to use both text and image modalities of social media data to learn a joint representation using state-of-the-art deep learning techniques. Experiments on real-world disaster datasets show that the proposed multimodal architecture yields better performance than models trained using a single modality.
arXiv Detail & Related papers (2020-04-14T19:36:11Z)
Multimodal Categorization of Crisis Events in Social Media [81.07061295887172]
We present a new multimodal fusion method that leverages both images and texts as input. In particular, we introduce a cross-attention module that can filter uninformative and misleading components from weak modalities. We show that our method outperforms the unimodal approaches and strong multimodal baselines by a large margin on three crisis-related tasks.
arXiv Detail & Related papers (2020-04-10T06:31:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.