BLUE at Memotion 2.0 2022: You have my Image, my Text and my Transformer
- URL: http://arxiv.org/abs/2202.07543v1
- Date: Tue, 15 Feb 2022 16:25:02 GMT
- Title: BLUE at Memotion 2.0 2022: You have my Image, my Text and my Transformer
- Authors: Ana-Maria Bucur, Adrian Cosma and Ioan-Bogdan Iordache
- Abstract summary: We present team BLUE's solution for the second edition of the MEMOTION competition.
We showcase two approaches for meme classification using a text-only method using BERT.
We obtain first place in task A, second place in task B and third place in task C.
- Score: 12.622643370707333
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Memes are prevalent on the internet and continue to grow and evolve alongside
our culture. An automatic understanding of memes propagating on the internet
can shed light on the general sentiment and cultural attitudes of people. In
this work, we present team BLUE's solution for the second edition of the
MEMOTION competition. We showcase two approaches for meme classification (i.e.
sentiment, humour, offensive, sarcasm and motivation levels) using a text-only
method using BERT, and a Multi-Modal-Multi-Task transformer network that
operates on both the meme image and its caption to output the final scores. In
both approaches, we leverage state-of-the-art pretrained models for text (BERT,
Sentence Transformer) and image processing (EfficientNetV4, CLIP). Through our
efforts, we obtain first place in task A, second place in task B and third
place in task C. In addition, our team obtained the highest average score for
all three tasks.
Related papers
- XMeCap: Meme Caption Generation with Sub-Image Adaptability [53.2509590113364]
Humor, deeply rooted in societal meanings and cultural details, poses a unique challenge for machines.
We introduce the textscXMeCap framework, which adopts supervised fine-tuning and reinforcement learning.
textscXMeCap achieves an average evaluation score of 75.85 for single-image memes and 66.32 for multi-image memes, outperforming the best baseline by 3.71% and 4.82%, respectively.
arXiv Detail & Related papers (2024-07-24T10:51:46Z) - BCAmirs at SemEval-2024 Task 4: Beyond Words: A Multimodal and Multilingual Exploration of Persuasion in Memes [17.09830912625338]
We introduce a caption generation step to assess the modality gap and the impact of additional semantic information from images.
Our best model utilizes GPT-4 generated captions alongside meme text to fine-tune RoBERTa as the text encoder and CLIP as the image encoder.
arXiv Detail & Related papers (2024-04-03T19:17:43Z) - An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance [53.974497865647336]
We take a first step towards translating images to make them culturally relevant.
We build three pipelines comprising state-of-the-art generative models to do the task.
We conduct a human evaluation of translated images to assess for cultural relevance and meaning preservation.
arXiv Detail & Related papers (2024-04-01T17:08:50Z) - Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes
Through Multimodal Explanations [48.82168723932981]
We introduce em MultiBully-Ex, the first benchmark dataset for multimodal explanation from code-mixed cyberbullying memes.
A Contrastive Language-Image Pretraining (CLIP) approach has been proposed for visual and textual explanation of a meme.
arXiv Detail & Related papers (2024-01-18T11:24:30Z) - Mapping Memes to Words for Multimodal Hateful Meme Classification [26.101116761577796]
Some memes take a malicious turn, promoting hateful content and perpetuating discrimination.
We propose a novel approach named ISSUES for multimodal hateful meme classification.
Our method achieves state-of-the-art results on the Hateful Memes Challenge and HarMeme datasets.
arXiv Detail & Related papers (2023-10-12T14:38:52Z) - Learning to Imagine: Visually-Augmented Natural Language Generation [73.65760028876943]
We propose a method to make pre-trained language models (PLMs) Learn to Imagine for Visuallyaugmented natural language gEneration.
We use a diffusion model to synthesize high-quality images conditioned on the input texts.
We conduct synthesis for each sentence rather than generate only one image for an entire paragraph.
arXiv Detail & Related papers (2023-05-26T13:59:45Z) - MemeFier: Dual-stage Modality Fusion for Image Meme Classification [8.794414326545697]
New forms of digital content such as image memes have given rise to spread of hate using multimodal means.
We propose MemeFier, a deep learning-based architecture for fine-grained classification of Internet image memes.
arXiv Detail & Related papers (2023-04-06T07:36:52Z) - NYCU-TWO at Memotion 3: Good Foundation, Good Teacher, then you have
Good Meme Analysis [4.361904115604854]
This paper presents a robust solution to the Memotion 3.0 Shared Task.
The goal of this task is to classify the emotion and the corresponding intensity expressed by memes.
Understanding the multi-modal features of the given memes will be the key to solving the task.
arXiv Detail & Related papers (2023-02-13T03:25:37Z) - UFO: A UniFied TransfOrmer for Vision-Language Representation Learning [54.82482779792115]
We propose a single UniFied transfOrmer (UFO) capable of processing either unimodal inputs (e.g., image or language) or multimodal inputs (e.g., the concatenation of the image and the question) for vision-language (VL) representation learning.
Existing approaches typically design an individual network for each modality and/or a specific fusion network for multimodal tasks.
arXiv Detail & Related papers (2021-11-19T03:23:10Z) - Memes in the Wild: Assessing the Generalizability of the Hateful Memes
Challenge Dataset [47.65948529524281]
We collect hateful and non-hateful memes from Pinterest to evaluate out-of-sample performance on models pre-trained on the Facebook dataset.
We find that memes in the wild differ in two key aspects: 1) Captions must be extracted via OCR, and 2) Memes are more diverse than traditional memes', including screenshots of conversations or text on a plain background.
arXiv Detail & Related papers (2021-07-09T09:04:05Z) - IITK at SemEval-2020 Task 8: Unimodal and Bimodal Sentiment Analysis of
Internet Memes [2.2385755093672044]
We present our approaches for the Memotion Analysis problem as posed in SemEval-2020 Task 8.
The goal of this task is to classify memes based on their emotional content and sentiment.
Our results show that a text-only approach, a simple Feed Forward Neural Network (FFNN) with Word2vec embeddings as input, performs superior to all the others.
arXiv Detail & Related papers (2020-07-21T14:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.