Related papers: MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing

MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing

URL: http://arxiv.org/abs/2405.11215v1
Date: Sat, 18 May 2024 07:44:41 GMT
Title: MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing
Authors: Siddhant Agarwal, Shivam Sharma, Preslav Nakov, Tanmoy Chakraborty,
Abstract summary: We introduce MemeMQA, a multimodal question-answering framework to solicit accurate responses to structured questions. We also propose ARSENAL, a novel two-stage multimodal framework to address MemeMQA.
Score: 53.30190591805432
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Memes have evolved as a prevalent medium for diverse communication, ranging from humour to propaganda. With the rising popularity of image-focused content, there is a growing need to explore its potential harm from different aspects. Previous studies have analyzed memes in closed settings - detecting harm, applying semantic labels, and offering natural language explanations. To extend this research, we introduce MemeMQA, a multimodal question-answering framework aiming to solicit accurate responses to structured questions while providing coherent explanations. We curate MemeMQACorpus, a new dataset featuring 1,880 questions related to 1,122 memes with corresponding answer-explanation pairs. We further propose ARSENAL, a novel two-stage multimodal framework that leverages the reasoning capabilities of LLMs to address MemeMQA. We benchmark MemeMQA using competitive baselines and demonstrate its superiority - ~18% enhanced answer prediction accuracy and distinct text generation lead across various metrics measuring lexical and semantic alignment over the best baseline. We analyze ARSENAL's robustness through diversification of question-set, confounder-based evaluation regarding MemeMQA's generalizability, and modality-specific assessment, enhancing our understanding of meme interpretation in the multimodal communication landscape.

Related papers

AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness [16.4111250168657]
multimodal Large Language Models (mLLMs) must effectively understand meme harmfulness.<n>Existing benchmarks for assessing mLLMs on harmful meme understanding rely on accuracy-based, model-agnostic evaluations using static datasets.<n>We propose AdamMeme, a flexible, agent-based evaluation framework that adaptively probes the reasoning capabilities of mLLMs in deciphering meme harmfulness.
arXiv Detail & Related papers (2025-07-02T13:32:30Z)
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix [50.71803775663387]
MMAR comprises 1,000 meticulously curated audio-question-answer triplets.<n>MMAR extends existing benchmarks to a broad spectrum of real-world audio scenarios.<n>We evaluate MMAR using a broad set of models, including Large Audio-Language Models (LALMs)
arXiv Detail & Related papers (2025-05-19T12:18:42Z)
MemeIntel: Explainable Detection of Propagandistic and Hateful Memes [7.844829622785847]
We introduce MemeIntel, an explanation-enhanced dataset for propaganda memes in Arabic and hateful memes in English. We propose a multi-stage optimization approach and train Vision-Language Models (VLMs) Our results demonstrate that this approach significantly improves performance over the base model for both textbflabel detection and explanation generation.
arXiv Detail & Related papers (2025-02-23T15:35:48Z)
Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions [4.649093665157263]
In this paper, we introduce IntMeme, a novel framework that leverages Large Multimodal Models (LMMs) for hateful meme classification with explainable decisions. IntMeme addresses the dual challenges of improving both accuracy and explainability in meme moderation. Our approach addresses the opacity and misclassification issues associated with PT-VLMs, optimizing the use of LMMs for hateful meme detection.
arXiv Detail & Related papers (2025-02-16T10:45:40Z)
MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention [43.849634264271565]
We present textitMemeGuard, a comprehensive framework leveraging Large Language Models (LLMs) and Visual Language Models (VLMs) for meme intervention. textitMemeGuard harnesses a specially fine-tuned VLM, textitVLMeme, for meme interpretation, and a multimodal knowledge selection and ranking mechanism. We leverage textitICMM to test textitMemeGuard, demonstrating its proficiency in generating relevant and effective responses to toxic memes.
arXiv Detail & Related papers (2024-06-08T04:09:20Z)
Asking Multimodal Clarifying Questions in Mixed-Initiative Conversational Search [89.1772985740272]
In mixed-initiative conversational search systems, clarifying questions are used to help users who struggle to express their intentions in a single query. We hypothesize that in scenarios where multimodal information is pertinent, the clarification process can be improved by using non-textual information. We collect a dataset named Melon that contains over 4k multimodal clarifying questions, enriched with over 14k images. Several analyses are conducted to understand the importance of multimodal contents during the query clarification phase.
arXiv Detail & Related papers (2024-02-12T16:04:01Z)
WisdoM: Improving Multimodal Sentiment Analysis by Fusing Contextual World Knowledge [73.76722241704488]
We propose a plug-in framework named WisdoM to leverage the contextual world knowledge induced from the large vision-language models (LVLMs) for enhanced multimodal sentiment analysis. We show that our approach has substantial improvements over several state-of-the-art methods.
arXiv Detail & Related papers (2024-01-12T16:08:07Z)
MEMEX: Detecting Explanatory Evidence for Memes via Knowledge-Enriched Contextualization [31.209594252045566]
We propose a novel task, MEMEX, given a meme and a related document, the aim is to mine the context that succinctly explains the background of the meme. To benchmark MCC, we propose MIME, a multimodal neural framework that uses common sense enriched meme representation and a layered approach to capture the cross-modal semantic dependencies between the meme and the context.
arXiv Detail & Related papers (2023-05-25T10:19:35Z)
Answering Questions by Meta-Reasoning over Multiple Chains of Thought [53.55653437903948]
We introduce Multi-Chain Reasoning (MCR), an approach which prompts large language models to meta-reason over multiple chains of thought. MCR examines different reasoning chains, mixes information between them and selects the most relevant facts in generating an explanation and predicting the answer.
arXiv Detail & Related papers (2023-04-25T17:27:37Z)
Enhancing Multimodal Entity and Relation Extraction with Variational Information Bottleneck [12.957002659910456]
We study the multimodal named entity recognition (MNER) and multimodal relation extraction (MRE) The core of MNER and MRE lies in incorporating evident visual information to enhance textual semantics. We propose a novel method for MNER and MRE by Multi-Modal representation learning with Information Bottleneck (MMIB)
arXiv Detail & Related papers (2023-04-05T09:32:25Z)
What do you MEME? Generating Explanations for Visual Semantic Role Labelling in Memes [42.357272117919464]
We introduce a novel task - EXCLAIM, generating explanations for visual semantic role labeling in memes. To this end, we curate ExHVV, a novel dataset that offers natural language explanations of connotative roles for three types of entities. We also posit LUMEN, a novel multimodal, multi-task learning framework that endeavors to address EXCLAIM optimally.
arXiv Detail & Related papers (2022-12-01T18:21:36Z)
Detecting and Understanding Harmful Memes: A Survey [48.135415967633676]
We offer a comprehensive survey with a focus on harmful memes. One interesting finding is that many types of harmful memes are not really studied, e.g., such featuring self-harm and extremism. Another observation is that memes can propagate globally through repackaging in different languages and that they can also be multilingual.
arXiv Detail & Related papers (2022-05-09T13:43:27Z)
A Multimodal Framework for the Detection of Hateful Memes [16.7604156703965]
We aim to develop a framework for the detection of hateful memes. We show the effectiveness of upsampling of contrastive examples to encourage multimodality and ensemble learning. Our best approach comprises an ensemble of UNITER-based models and achieves an AUROC score of 80.53, placing us 4th on phase 2 of the 2020 Hateful Memes Challenge organized by Facebook.
arXiv Detail & Related papers (2020-12-23T18:37:11Z)
A Multimodal Memes Classification: A Survey and Open Research Issues [4.504833177846264]
Many memes get uploaded each day on social media platforms that need automatic censoring to curb misinformation and hate. This study aims to conduct a comprehensive study on memes classification, generally on the Visual-Linguistic (VL) multimodal problems and cutting edge solutions.
arXiv Detail & Related papers (2020-09-17T16:13:21Z)
Multi-Stage Conversational Passage Retrieval: An Approach to Fusing Term Importance Estimation and Neural Query Rewriting [56.268862325167575]
We tackle conversational passage retrieval (ConvPR) with query reformulation integrated into a multi-stage ad-hoc IR system. We propose two conversational query reformulation (CQR) methods: (1) term importance estimation and (2) neural query rewriting. For the former, we expand conversational queries using important terms extracted from the conversational context with frequency-based signals. For the latter, we reformulate conversational queries into natural, standalone, human-understandable queries with a pretrained sequence-tosequence model.
arXiv Detail & Related papers (2020-05-05T14:30:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.