What do you MEME? Generating Explanations for Visual Semantic Role
Labelling in Memes
- URL: http://arxiv.org/abs/2212.00715v2
- Date: Tue, 20 Dec 2022 14:29:43 GMT
- Title: What do you MEME? Generating Explanations for Visual Semantic Role
Labelling in Memes
- Authors: Shivam Sharma, Siddhant Agarwal, Tharun Suresh, Preslav Nakov, Md.
Shad Akhtar, Tanmoy Chakraborty
- Abstract summary: We introduce a novel task - EXCLAIM, generating explanations for visual semantic role labeling in memes.
To this end, we curate ExHVV, a novel dataset that offers natural language explanations of connotative roles for three types of entities.
We also posit LUMEN, a novel multimodal, multi-task learning framework that endeavors to address EXCLAIM optimally.
- Score: 42.357272117919464
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Memes are powerful means for effective communication on social media. Their
effortless amalgamation of viral visuals and compelling messages can have
far-reaching implications with proper marketing. Previous research on memes has
primarily focused on characterizing their affective spectrum and detecting
whether the meme's message insinuates any intended harm, such as hate, offense,
racism, etc. However, memes often use abstraction, which can be elusive. Here,
we introduce a novel task - EXCLAIM, generating explanations for visual
semantic role labeling in memes. To this end, we curate ExHVV, a novel dataset
that offers natural language explanations of connotative roles for three types
of entities - heroes, villains, and victims, encompassing 4,680 entities
present in 3K memes. We also benchmark ExHVV with several strong unimodal and
multimodal baselines. Moreover, we posit LUMEN, a novel multimodal, multi-task
learning framework that endeavors to address EXCLAIM optimally by jointly
learning to predict the correct semantic roles and correspondingly to generate
suitable natural language explanations. LUMEN distinctly outperforms the best
baseline across 18 standard natural language generation evaluation metrics. Our
systematic evaluation and analyses demonstrate that characteristic multimodal
cues required for adjudicating semantic roles are also helpful for generating
suitable explanations.
Related papers
- XMeCap: Meme Caption Generation with Sub-Image Adaptability [53.2509590113364]
Humor, deeply rooted in societal meanings and cultural details, poses a unique challenge for machines.
We introduce the textscXMeCap framework, which adopts supervised fine-tuning and reinforcement learning.
textscXMeCap achieves an average evaluation score of 75.85 for single-image memes and 66.32 for multi-image memes, outperforming the best baseline by 3.71% and 4.82%, respectively.
arXiv Detail & Related papers (2024-07-24T10:51:46Z) - MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing [53.30190591805432]
We introduce MemeMQA, a multimodal question-answering framework to solicit accurate responses to structured questions.
We also propose ARSENAL, a novel two-stage multimodal framework to address MemeMQA.
arXiv Detail & Related papers (2024-05-18T07:44:41Z) - Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes
Through Multimodal Explanations [48.82168723932981]
We introduce em MultiBully-Ex, the first benchmark dataset for multimodal explanation from code-mixed cyberbullying memes.
A Contrastive Language-Image Pretraining (CLIP) approach has been proposed for visual and textual explanation of a meme.
arXiv Detail & Related papers (2024-01-18T11:24:30Z) - PromptMTopic: Unsupervised Multimodal Topic Modeling of Memes using
Large Language Models [7.388466146105024]
We propose textPromptMTopic, a novel multimodal prompt-based model to learn topics from both text and visual modalities.
Our model effectively extracts and clusters topics learned from memes, considering the semantic interaction between the text and visual modalities.
Our work contributes to the understanding of the topics and themes of memes, a crucial form of communication in today's society.
arXiv Detail & Related papers (2023-12-11T03:36:50Z) - Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning
Distilled from Large Language Models [17.617187709968242]
Existing harmful meme detection approaches only recognize superficial harm-indicative signals in an end-to-end classification manner.
We propose a novel generative framework to learn reasonable thoughts from Large Language Models for better multimodal fusion.
Our proposed approach achieves superior performance than state-of-the-art methods on the harmful meme detection task.
arXiv Detail & Related papers (2023-12-09T01:59:11Z) - MEMEX: Detecting Explanatory Evidence for Memes via Knowledge-Enriched
Contextualization [31.209594252045566]
We propose a novel task, MEMEX, given a meme and a related document, the aim is to mine the context that succinctly explains the background of the meme.
To benchmark MCC, we propose MIME, a multimodal neural framework that uses common sense enriched meme representation and a layered approach to capture the cross-modal semantic dependencies between the meme and the context.
arXiv Detail & Related papers (2023-05-25T10:19:35Z) - Characterizing the Entities in Harmful Memes: Who is the Hero, the
Villain, the Victim? [39.55435707149863]
We aim to understand whether the meme glorifies, vilifies, or victimizes each entity it refers to.
Our proposed model achieves an improvement of 4% over the best baseline and 1% over the best competing stand-alone submission.
arXiv Detail & Related papers (2023-01-26T16:55:15Z) - Detecting and Understanding Harmful Memes: A Survey [48.135415967633676]
We offer a comprehensive survey with a focus on harmful memes.
One interesting finding is that many types of harmful memes are not really studied, e.g., such featuring self-harm and extremism.
Another observation is that memes can propagate globally through repackaging in different languages and that they can also be multilingual.
arXiv Detail & Related papers (2022-05-09T13:43:27Z) - Exploiting BERT For Multimodal Target SentimentClassification Through
Input Space Translation [75.82110684355979]
We introduce a two-stream model that translates images in input space using an object-aware transformer.
We then leverage the translation to construct an auxiliary sentence that provides multimodal information to a language model.
We achieve state-of-the-art performance on two multimodal Twitter datasets.
arXiv Detail & Related papers (2021-08-03T18:02:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.