NYCU-TWO at Memotion 3: Good Foundation, Good Teacher, then you have
Good Meme Analysis
- URL: http://arxiv.org/abs/2302.06078v2
- Date: Tue, 14 Feb 2023 03:53:02 GMT
- Title: NYCU-TWO at Memotion 3: Good Foundation, Good Teacher, then you have
Good Meme Analysis
- Authors: Yu-Chien Tang, Kuang-Da Wang, Ting-Yun Ou, Wen-Chih Peng
- Abstract summary: This paper presents a robust solution to the Memotion 3.0 Shared Task.
The goal of this task is to classify the emotion and the corresponding intensity expressed by memes.
Understanding the multi-modal features of the given memes will be the key to solving the task.
- Score: 4.361904115604854
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a robust solution to the Memotion 3.0 Shared Task. The
goal of this task is to classify the emotion and the corresponding intensity
expressed by memes, which are usually in the form of images with short captions
on social media. Understanding the multi-modal features of the given memes will
be the key to solving the task. In this work, we use CLIP to extract aligned
image-text features and propose a novel meme sentiment analysis framework,
consisting of a Cooperative Teaching Model (CTM) for Task A and a Cascaded
Emotion Classifier (CEC) for Tasks B&C. CTM is based on the idea of knowledge
distillation, and can better predict the sentiment of a given meme in Task A;
CEC can leverage the emotion intensity suggestion from the prediction of Task C
to classify the emotion more precisely in Task B. Experiments show that we
achieved the 2nd place ranking for both Task A and Task B and the 4th place
ranking for Task C, with weighted F1-scores of 0.342, 0.784, and 0.535
respectively. The results show the robustness and effectiveness of our
framework. Our code is released at github.
Related papers
- A Unified Framework for 3D Scene Understanding [50.6762892022386]
UniSeg3D is a unified 3D segmentation framework that achieves panoptic, semantic, instance, interactive, referring, and open-vocabulary semantic segmentation tasks within a single model.
It facilitates inter-task knowledge sharing and promotes comprehensive 3D scene understanding.
Experiments on three benchmarks, including the ScanNet20, ScanRefer, and ScanNet200, demonstrate that the UniSeg3D consistently outperforms current SOTA methods.
arXiv Detail & Related papers (2024-07-03T16:50:07Z) - Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification [54.96876797812238]
We present a novel CrOss-moDal nEighbor Representation(CODER) based on the distance structure between images and their neighbor texts.
The key to construct a high-quality CODER lies in how to create a vast amount of high-quality and diverse texts to match with images.
Experiment results across various datasets and models confirm CODER's effectiveness.
arXiv Detail & Related papers (2024-04-27T02:04:36Z) - BCAmirs at SemEval-2024 Task 4: Beyond Words: A Multimodal and Multilingual Exploration of Persuasion in Memes [17.09830912625338]
We introduce a caption generation step to assess the modality gap and the impact of additional semantic information from images.
Our best model utilizes GPT-4 generated captions alongside meme text to fine-tune RoBERTa as the text encoder and CLIP as the image encoder.
arXiv Detail & Related papers (2024-04-03T19:17:43Z) - Overview of Memotion 3: Sentiment and Emotion Analysis of Codemixed
Hinglish Memes [36.34201719103715]
We present the overview of the Memotion 3 shared task, as part of the DeFactify 2 workshop at AAAI-23.
The task released an annotated dataset of Hindi-English code-mixed memes based on their Sentiment (Task A), Emotion (Task B), and Emotion intensity (Task C)
Over 50 teams registered for the shared task and 5 made final submissions to the test set of the Memotion 3 dataset.
arXiv Detail & Related papers (2023-09-12T18:47:29Z) - Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models [52.3032592038514]
We propose a class-aware text prompt to enrich generated prompts with label-related image information.
We achieve an average improvement of 4.03% on new classes and 3.19% on harmonic-mean over eleven classification benchmarks.
arXiv Detail & Related papers (2023-03-30T06:02:40Z) - BLUE at Memotion 2.0 2022: You have my Image, my Text and my Transformer [12.622643370707333]
We present team BLUE's solution for the second edition of the MEMOTION competition.
We showcase two approaches for meme classification using a text-only method using BERT.
We obtain first place in task A, second place in task B and third place in task C.
arXiv Detail & Related papers (2022-02-15T16:25:02Z) - Unifying Architectures, Tasks, and Modalities Through a Simple
Sequence-to-Sequence Learning Framework [83.82026345508334]
We propose OFA, a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image generation, visual grounding, image captioning, image classification, text generation, etc.)
OFA achieves new state-of-the-arts on a series of multimodal tasks, including image captioning (COCO test CIDEr: 149.6), text-to-image generation (COCO test FID: 10.5), VQA (test-std encoder acc.: 80.02), SNLI-VE (test acc.: 90.
arXiv Detail & Related papers (2022-02-07T10:38:21Z) - UPB at SemEval-2020 Task 8: Joint Textual and Visual Modeling in a
Multi-Task Learning Architecture for Memotion Analysis [1.2233362977312945]
We describe the system developed by our team for SemEval-2020 Task 8: Memotion Analysis.
We introduce a novel system to analyze these posts, a multimodal multi-task learning architecture that combines ALBERT for text encoding with VGG-16 for image representation.
Our approach achieves good performance on each of the three subtasks of the current competition, ranking 11th for Subtask A (0.3453 macro F1-score), 1st for Subtask B (0.5183 macro F1-score), and 3rd for Subtask C (0.3171 macro F1-score)
arXiv Detail & Related papers (2020-09-06T17:17:41Z) - Tasks Integrated Networks: Joint Detection and Retrieval for Image
Search [99.49021025124405]
In many real-world searching scenarios (e.g., video surveillance), the objects are seldom accurately detected or annotated.
We first introduce an end-to-end Integrated Net (I-Net), which has three merits.
We further propose an improved I-Net, called DC-I-Net, which makes two new contributions.
arXiv Detail & Related papers (2020-09-03T03:57:50Z) - DSC IIT-ISM at SemEval-2020 Task 8: Bi-Fusion Techniques for Deep Meme
Emotion Analysis [5.259920715958942]
This paper presents our work on theMemotion Analysis shared task of SemEval 2020.
We propose a system which uses different bimodal fusion techniques toleverage the inter-modal dependency for sentiment and humor classification tasks.
arXiv Detail & Related papers (2020-07-28T17:23:35Z) - DeepEMD: Differentiable Earth Mover's Distance for Few-Shot Learning [122.51237307910878]
We develop methods for few-shot image classification from a new perspective of optimal matching between image regions.
We employ the Earth Mover's Distance (EMD) as a metric to compute a structural distance between dense image representations.
To generate the important weights of elements in the formulation, we design a cross-reference mechanism.
arXiv Detail & Related papers (2020-03-15T08:13:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.