Multimodal and Explainable Internet Meme Classification
- URL: http://arxiv.org/abs/2212.05612v3
- Date: Fri, 7 Apr 2023 00:57:16 GMT
- Title: Multimodal and Explainable Internet Meme Classification
- Authors: Abhinav Kumar Thakur, Filip Ilievski, H\^ong-\^An Sandlin, Zhivar
Sourati, Luca Luceri, Riccardo Tommasini and Alain Mermoud
- Abstract summary: We design and implement a modular and explainable architecture for Internet meme understanding.
We study the relevance of our modular and explainable models in detecting harmful memes on two existing tasks: Hate Speech Detection and Misogyny Classification.
We devise a user-friendly interface that facilitates the comparative analysis of examples retrieved by all of our models for any given meme.
- Score: 3.4690152926833315
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the current context where online platforms have been effectively
weaponized in a variety of geo-political events and social issues, Internet
memes make fair content moderation at scale even more difficult. Existing work
on meme classification and tracking has focused on black-box methods that do
not explicitly consider the semantics of the memes or the context of their
creation. In this paper, we pursue a modular and explainable architecture for
Internet meme understanding. We design and implement multimodal classification
methods that perform example- and prototype-based reasoning over training
cases, while leveraging both textual and visual SOTA models to represent the
individual cases. We study the relevance of our modular and explainable models
in detecting harmful memes on two existing tasks: Hate Speech Detection and
Misogyny Classification. We compare the performance between example- and
prototype-based methods, and between text, vision, and multimodal models,
across different categories of harmfulness (e.g., stereotype and
objectification). We devise a user-friendly interface that facilitates the
comparative analysis of examples retrieved by all of our models for any given
meme, informing the community about the strengths and limitations of these
explainable methods.
Related papers
- Towards Explainable Harmful Meme Detection through Multimodal Debate
between Large Language Models [18.181154544563416]
The age of social media is flooded with Internet memes, necessitating a clear grasp and effective identification of harmful ones.
Existing harmful meme detection methods do not present readable explanations that unveil such implicit meaning to support their detection decisions.
We propose an explainable approach to detect harmful memes, achieved through reasoning over conflicting rationales from both harmless and harmful positions.
arXiv Detail & Related papers (2024-01-24T08:37:16Z) - Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes
Through Multimodal Explanations [48.82168723932981]
We introduce em MultiBully-Ex, the first benchmark dataset for multimodal explanation from code-mixed cyberbullying memes.
A Contrastive Language-Image Pretraining (CLIP) approach has been proposed for visual and textual explanation of a meme.
arXiv Detail & Related papers (2024-01-18T11:24:30Z) - PromptMTopic: Unsupervised Multimodal Topic Modeling of Memes using
Large Language Models [7.388466146105024]
We propose textPromptMTopic, a novel multimodal prompt-based model to learn topics from both text and visual modalities.
Our model effectively extracts and clusters topics learned from memes, considering the semantic interaction between the text and visual modalities.
Our work contributes to the understanding of the topics and themes of memes, a crucial form of communication in today's society.
arXiv Detail & Related papers (2023-12-11T03:36:50Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - MultiViz: An Analysis Benchmark for Visualizing and Understanding
Multimodal Models [103.9987158554515]
MultiViz is a method for analyzing the behavior of multimodal models by scaffolding the problem of interpretability into 4 stages.
We show that the complementary stages in MultiViz together enable users to simulate model predictions, assign interpretable concepts to features, perform error analysis on model misclassifications, and use insights from error analysis to debug models.
arXiv Detail & Related papers (2022-06-30T18:42:06Z) - On Explaining Multimodal Hateful Meme Detection Models [4.509263496823139]
It is unclear if these models are able to capture the derogatory or slurs references in multimodality.
We found that the image modality contributes more to the hateful meme classification task.
Our error analysis shows that the visual-linguistic models have acquired biases, which resulted in false-positive predictions.
arXiv Detail & Related papers (2022-04-04T15:35:41Z) - Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context
Images via Online Resources [70.68526820807402]
A real image is re-purposed to support other narratives by misrepresenting its context and/or elements.
Our goal is an inspectable method that automates this time-consuming and reasoning-intensive process by fact-checking the image-context pairing.
Our work offers the first step and benchmark for open-domain, content-based, multi-modal fact-checking.
arXiv Detail & Related papers (2021-11-30T19:36:20Z) - Caption Enriched Samples for Improving Hateful Memes Detection [78.5136090997431]
The hateful meme challenge demonstrates the difficulty of determining whether a meme is hateful or not.
Both unimodal language models and multimodal vision-language models cannot reach the human level of performance.
arXiv Detail & Related papers (2021-09-22T10:57:51Z) - A Multimodal Framework for the Detection of Hateful Memes [16.7604156703965]
We aim to develop a framework for the detection of hateful memes.
We show the effectiveness of upsampling of contrastive examples to encourage multimodality and ensemble learning.
Our best approach comprises an ensemble of UNITER-based models and achieves an AUROC score of 80.53, placing us 4th on phase 2 of the 2020 Hateful Memes Challenge organized by Facebook.
arXiv Detail & Related papers (2020-12-23T18:37:11Z) - How Far are We from Effective Context Modeling? An Exploratory Study on
Semantic Parsing in Context [59.13515950353125]
We present a grammar-based decoding semantic parsing and adapt typical context modeling methods on top of it.
We evaluate 13 context modeling methods on two large cross-domain datasets, and our best model achieves state-of-the-art performances.
arXiv Detail & Related papers (2020-02-03T11:28:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.