Related papers: A Context-aware Attention and Graph Neural Network-based Multimodal Framework for Misogyny Detection

A Context-aware Attention and Graph Neural Network-based Multimodal Framework for Misogyny Detection

URL: http://arxiv.org/abs/2508.09175v1
Date: Thu, 07 Aug 2025 06:41:17 GMT
Title: A Context-aware Attention and Graph Neural Network-based Multimodal Framework for Misogyny Detection
Authors: Mohammad Zia Ur Rehman, Sufyaan Zahoor, Areeb Manzoor, Musharaf Maqbool, Nagendra Kumar,
Abstract summary: A substantial portion of offensive content on social media is directed towards women.<n>We propose a novel framework for the detection of misogynistic and sexist content.<n>The framework comprises three modules: the Multimodal Attention module (MANM), the Graph-based Feature Reconstruction Module (GFRM), and the Content-specific Features Learning Module (CFLM)
Score: 0.11650821883155184
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A substantial portion of offensive content on social media is directed towards women. Since the approaches for general offensive content detection face a challenge in detecting misogynistic content, it requires solutions tailored to address offensive content against women. To this end, we propose a novel multimodal framework for the detection of misogynistic and sexist content. The framework comprises three modules: the Multimodal Attention module (MANM), the Graph-based Feature Reconstruction Module (GFRM), and the Content-specific Features Learning Module (CFLM). The MANM employs adaptive gating-based multimodal context-aware attention, enabling the model to focus on relevant visual and textual information and generating contextually relevant features. The GFRM module utilizes graphs to refine features within individual modalities, while the CFLM focuses on learning text and image-specific features such as toxicity features and caption features. Additionally, we curate a set of misogynous lexicons to compute the misogyny-specific lexicon score from the text. We apply test-time augmentation in feature space to better generalize the predictions on diverse inputs. The performance of the proposed approach has been evaluated on two multimodal datasets, MAMI and MMHS150K, with 11,000 and 13,494 samples, respectively. The proposed method demonstrates an average improvement of 10.17% and 8.88% in macro-F1 over existing methods on the MAMI and MMHS150K datasets, respectively.

Related papers

MEMEWEAVER: Inter-Meme Graph Reasoning for Sexism and Misogyny Detection [15.449921882814428]
We present MemeWeaver, an end-to-end trainable framework for detecting sexism and misogyny through a novel inter-meme graph reasoning mechanism.<n>We show that our approach consistently outperforms state-of-the-art baselines on the MAMI and EXIST benchmarks.<n>Further analyses reveal that the learned graph structure captures semantically meaningful patterns, offering valuable insights into the relational nature of online hate.
arXiv Detail & Related papers (2026-01-13T16:06:41Z)
Dependency Structure Augmented Contextual Scoping Framework for Multimodal Aspect-Based Sentiment Analysis [9.561100210295699]
Multimodal Aspect-Based Sentiment Analysis (MABSA) seeks to extract fine-grained information from image-text pairs.<n>DASCO is a fine-grained scope-oriented framework that enhances aspect-level sentiment reasoning by leveraging dependency parsing trees.<n>Experiments on two benchmark datasets demonstrate that DASCO achieves state-of-the-art performance in MABSA.
arXiv Detail & Related papers (2025-04-15T16:05:09Z)
Towards Understanding Graphical Perception in Large Multimodal Models [80.44471730672801]
We leverage the theory of graphical perception to develop an evaluation framework for analyzing gaps in LMMs' perception abilities in charts.<n>We apply our framework to evaluate and diagnose the perception capabilities of state-of-the-art LMMs at three levels (chart, visual element, and pixel)
arXiv Detail & Related papers (2025-03-13T20:13:39Z)
Towards a Robust Framework for Multimodal Hate Detection: A Study on Video vs. Image-based Content [7.5253808885104325]
Social media platforms enable the propagation of hateful content across different modalities.<n>Recent approaches have shown promise in handling individual modalities, but their effectiveness across different modality combinations remains unexplored.<n>This paper presents a systematic analysis of fusion-based approaches for multimodal hate detection, focusing on their performance across video and image-based content.
arXiv Detail & Related papers (2025-02-11T00:07:40Z)
M3Hop-CoT: Misogynous Meme Identification with Multimodal Multi-hop Chain-of-Thought [21.06134139986278]
We introduce a Multimodal Multi-hop CoT (M3Hop-CoT) framework for Misogynous meme identification. M3Hop-CoT employs a three-step multimodal prompting principle to induce emotions, target awareness, and contextual knowledge for meme analysis. We evaluate the model's generalizability by evaluating it on various benchmark meme datasets.
arXiv Detail & Related papers (2024-10-11T19:50:53Z)
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark [53.61633384281524]
PolyMATH is a benchmark aimed at evaluating the general cognitive reasoning abilities of MLLMs. The best scores achieved on PolyMATH are 41%, 36%, and 27%, obtained by Claude-3.5 Sonnet, GPT-4o and Gemini-1.5 Pro respectively. A further fine-grained error analysis reveals that these models struggle to understand spatial relations and perform drawn-out, high-level reasoning.
arXiv Detail & Related papers (2024-10-06T20:35:41Z)
GCM-Net: Graph-enhanced Cross-Modal Infusion with a Metaheuristic-Driven Network for Video Sentiment and Emotion Analysis [2.012311338995539]
This paper presents a novel framework that leverages the multi-modal contextual information from utterances and applies metaheuristic algorithms to learn for utterance-level sentiment and emotion prediction. To show the effectiveness of our approach, we have conducted extensive evaluations on three prominent multimodal benchmark datasets.
arXiv Detail & Related papers (2024-10-02T10:07:48Z)
FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction [66.98008357232428]
We propose FineMatch, a new aspect-based fine-grained text and image matching benchmark. FineMatch focuses on text and image mismatch detection and correction. We show that models trained on FineMatch demonstrate enhanced proficiency in detecting fine-grained text and image mismatches.
arXiv Detail & Related papers (2024-04-23T03:42:14Z)
MVAM: Multi-View Attention Method for Fine-grained Image-Text Matching [65.87255122130188]
We propose a Multi-view Attention Method (MVAM) for image-text matching.<n>We also incorporate an objective to explicitly encourage attention heads to focus on distinct aspects of the input data.<n>Our method allows models to encode images and text from different perspectives and focus on more critical details, leading to better matching performance.
arXiv Detail & Related papers (2024-02-27T06:11:54Z)
Align before Attend: Aligning Visual and Textual Features for Multimodal Hateful Content Detection [4.997673761305336]
This paper proposes a context-aware attention framework for multimodal hateful content detection. We evaluate the proposed approach on two benchmark hateful meme datasets, viz. MUTE (Bengali code-mixed) and MultiOFF (English)
arXiv Detail & Related papers (2024-02-15T06:34:15Z)
A Novel Energy based Model Mechanism for Multi-modal Aspect-Based Sentiment Analysis [85.77557381023617]
We propose a novel framework called DQPSA for multi-modal sentiment analysis. PDQ module uses the prompt as both a visual query and a language query to extract prompt-aware visual information. EPE module models the boundaries pairing of the analysis target from the perspective of an Energy-based Model.
arXiv Detail & Related papers (2023-12-13T12:00:46Z)
Information Screening whilst Exploiting! Multimodal Relation Extraction with Feature Denoising and Multimodal Topic Modeling [96.75821232222201]
Existing research on multimodal relation extraction (MRE) faces two co-existing challenges, internal-information over-utilization and external-information under-exploitation. We propose a novel framework that simultaneously implements the idea of internal-information screening and external-information exploiting.
arXiv Detail & Related papers (2023-05-19T14:56:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.