MATK: The Meme Analytical Tool Kit
- URL: http://arxiv.org/abs/2312.06094v1
- Date: Mon, 11 Dec 2023 03:36:59 GMT
- Title: MATK: The Meme Analytical Tool Kit
- Authors: Ming Shan Hee, Aditi Kumaresan, Nguyen Khoi Hoang, Nirmalendu Prakash,
Rui Cao, Roy Ka-Wei Lee
- Abstract summary: We introduce the Meme Analytical Tool Kit (MATK), an open-source toolkit specifically designed to support existing memes datasets and cutting-edge multimodal models.
MATK aims to assist researchers and engineers in training and reproducing these multimodal models for meme classification tasks, while also providing analysis techniques to gain insights into their strengths and weaknesses.
- Score: 12.278828922709353
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rise of social media platforms has brought about a new digital culture
called memes. Memes, which combine visuals and text, can strongly influence
public opinions on social and cultural issues. As a result, people have become
interested in categorizing memes, leading to the development of various
datasets and multimodal models that show promising results in this field.
However, there is currently a lack of a single library that allows for the
reproduction, evaluation, and comparison of these models using fair benchmarks
and settings. To fill this gap, we introduce the Meme Analytical Tool Kit
(MATK), an open-source toolkit specifically designed to support existing memes
datasets and cutting-edge multimodal models. MATK aims to assist researchers
and engineers in training and reproducing these multimodal models for meme
classification tasks, while also providing analysis techniques to gain insights
into their strengths and weaknesses. To access MATK, please visit
\url{https://github.com/Social-AI-Studio/MATK}.
Related papers
- InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning [58.7966588457529]
InfiMM-WebMath-40B is a high-quality dataset of interleaved image-text documents.
It comprises 24 million web pages, 85 million associated image URLs, and 40 billion text tokens, all meticulously extracted and filtered from CommonCrawl.
Our evaluations on text-only benchmarks show that, despite utilizing only 40 billion tokens, our dataset significantly enhances the performance of our 1.3B model.
Our models set a new state-of-the-art among open-source models on multi-modal math benchmarks such as MathVerse and We-Math.
arXiv Detail & Related papers (2024-09-19T08:41:21Z) - Knowledge-Aware Reasoning over Multimodal Semi-structured Tables [85.24395216111462]
This study investigates whether current AI models can perform knowledge-aware reasoning on multimodal structured data.
We introduce MMTabQA, a new dataset designed for this purpose.
Our experiments highlight substantial challenges for current AI models in effectively integrating and interpreting multiple text and image inputs.
arXiv Detail & Related papers (2024-08-25T15:17:43Z) - Decoding Memes: A Comparative Study of Machine Learning Models for Template Identification [0.0]
"meme template" is a layout or format that is used to create memes.
Despite extensive research on meme virality, the task of automatically identifying meme templates remains a challenge.
This paper presents a comprehensive comparison and evaluation of existing meme template identification methods.
arXiv Detail & Related papers (2024-08-15T12:52:06Z) - VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models [89.63342806812413]
We present an open-source toolkit for evaluating large multi-modality models based on PyTorch.
VLMEvalKit implements over 70 different large multi-modality models, including both proprietary APIs and open-source models.
We host OpenVLM Leaderboard to track the progress of multi-modality learning research.
arXiv Detail & Related papers (2024-07-16T13:06:15Z) - Explainable Multimodal Sentiment Analysis on Bengali Memes [0.0]
Understanding and interpreting the sentiment underlying memes has become crucial in the age of information.
This study employed a multimodal approach using ResNet50 and BanglishBERT and achieved a satisfactory result of 0.71 weighted F1-score.
arXiv Detail & Related papers (2023-12-20T17:15:10Z) - PromptMTopic: Unsupervised Multimodal Topic Modeling of Memes using
Large Language Models [7.388466146105024]
We propose textPromptMTopic, a novel multimodal prompt-based model to learn topics from both text and visual modalities.
Our model effectively extracts and clusters topics learned from memes, considering the semantic interaction between the text and visual modalities.
Our work contributes to the understanding of the topics and themes of memes, a crucial form of communication in today's society.
arXiv Detail & Related papers (2023-12-11T03:36:50Z) - mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large
Language Model [73.38800189095173]
This work focuses on strengthening the multi-modal diagram analysis ability of Multimodal LLMs.
By parsing Latex source files of high-quality papers, we carefully build a multi-modal diagram understanding dataset M-Paper.
M-Paper is the first dataset to support joint comprehension of multiple scientific diagrams, including figures and tables in the format of images or Latex codes.
arXiv Detail & Related papers (2023-11-30T04:43:26Z) - A Template Is All You Meme [76.03172165923058]
We create a knowledge base composed of more than 5,200 meme templates, information about them, and 54,000 examples of template instances.
To investigate the semantic signal of meme templates, we show that we can match memes in datasets to base templates contained in our knowledge base with a distance-based lookup.
Our examination of meme templates results in state-of-the-art performance for every dataset we consider, paving the way for analysis grounded in templateness.
arXiv Detail & Related papers (2023-11-11T19:38:14Z) - SemiMemes: A Semi-supervised Learning Approach for Multimodal Memes
Analysis [0.0]
SemiMemes is a novel training method that combines auto-encoder and classification task to make use of the resourceful unlabeled data.
This research proposes a multimodal semi-supervised learning approach that outperforms other multimodal semi-supervised learning and supervised learning state-of-the-art models.
arXiv Detail & Related papers (2023-03-31T11:22:03Z) - Cluster-based Deep Ensemble Learning for Emotion Classification in
Internet Memes [18.86848589288164]
We propose a novel model, cluster-based deep ensemble learning (CDEL), for emotion classification in memes.
CDEL is a hybrid model that leverages the benefits of a deep learning model in combination with a clustering algorithm.
We evaluate the performance of CDEL on a benchmark dataset for emotion classification.
arXiv Detail & Related papers (2023-02-16T15:01:07Z) - Detecting and Understanding Harmful Memes: A Survey [48.135415967633676]
We offer a comprehensive survey with a focus on harmful memes.
One interesting finding is that many types of harmful memes are not really studied, e.g., such featuring self-harm and extremism.
Another observation is that memes can propagate globally through repackaging in different languages and that they can also be multilingual.
arXiv Detail & Related papers (2022-05-09T13:43:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.