IITK at SemEval-2020 Task 8: Unimodal and Bimodal Sentiment Analysis of
Internet Memes
- URL: http://arxiv.org/abs/2007.10822v1
- Date: Tue, 21 Jul 2020 14:06:26 GMT
- Title: IITK at SemEval-2020 Task 8: Unimodal and Bimodal Sentiment Analysis of
Internet Memes
- Authors: Vishal Keswani, Sakshi Singh, Suryansh Agarwal, Ashutosh Modi
- Abstract summary: We present our approaches for the Memotion Analysis problem as posed in SemEval-2020 Task 8.
The goal of this task is to classify memes based on their emotional content and sentiment.
Our results show that a text-only approach, a simple Feed Forward Neural Network (FFNN) with Word2vec embeddings as input, performs superior to all the others.
- Score: 2.2385755093672044
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Social media is abundant in visual and textual information presented together
or in isolation. Memes are the most popular form, belonging to the former
class. In this paper, we present our approaches for the Memotion Analysis
problem as posed in SemEval-2020 Task 8. The goal of this task is to classify
memes based on their emotional content and sentiment. We leverage techniques
from Natural Language Processing (NLP) and Computer Vision (CV) towards the
sentiment classification of internet memes (Subtask A). We consider Bimodal
(text and image) as well as Unimodal (text-only) techniques in our study
ranging from the Na\"ive Bayes classifier to Transformer-based approaches. Our
results show that a text-only approach, a simple Feed Forward Neural Network
(FFNN) with Word2vec embeddings as input, performs superior to all the others.
We stand first in the Sentiment analysis task with a relative improvement of
63% over the baseline macro-F1 score. Our work is relevant to any task
concerned with the combination of different modalities.
Related papers
- VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning [66.23296689828152]
We leverage the capabilities of Vision-and-Large-Language Models to enhance in-context emotion classification.
In the first stage, we propose prompting VLLMs to generate descriptions in natural language of the subject's apparent emotion.
In the second stage, the descriptions are used as contextual information and, along with the image input, are used to train a transformer-based architecture.
arXiv Detail & Related papers (2024-04-10T15:09:15Z) - Towards Unifying Medical Vision-and-Language Pre-training via Soft
Prompts [63.84720380390935]
There exist two typical types, textiti.e., the fusion-encoder type and the dual-encoder type, depending on whether a heavy fusion module is used.
We propose an effective yet straightforward scheme named PTUnifier to unify the two types.
We first unify the input format by introducing visual and textual prompts, which serve as a feature bank that stores the most representative images/texts.
arXiv Detail & Related papers (2023-02-17T15:43:42Z) - Universal Multimodal Representation for Language Understanding [110.98786673598015]
This work presents new methods to employ visual information as assistant signals to general NLP tasks.
For each sentence, we first retrieve a flexible number of images either from a light topic-image lookup table extracted over the existing sentence-image pairs.
Then, the text and images are encoded by a Transformer encoder and convolutional neural network, respectively.
arXiv Detail & Related papers (2023-01-09T13:54:11Z) - UniTE: Unified Translation Evaluation [63.58868113074476]
UniTE is the first unified framework engaged with abilities to handle all three evaluation tasks.
We testify our framework on WMT 2019 Metrics and WMT 2020 Quality Estimation benchmarks.
arXiv Detail & Related papers (2022-04-28T08:35:26Z) - Probing Inter-modality: Visual Parsing with Self-Attention for
Vision-Language Pre-training [139.4566371416662]
Vision-Language Pre-training aims to learn multi-modal representations from image-text pairs.
CNNs have limitations in visual relation learning due to local receptive field's weakness in modeling long-range dependencies.
arXiv Detail & Related papers (2021-06-25T08:04:25Z) - Exercise? I thought you said 'Extra Fries': Leveraging Sentence
Demarcations and Multi-hop Attention for Meme Affect Analysis [18.23523076710257]
We propose a multi-hop attention-based deep neural network framework, called MHA-MEME.
Its prime objective is to leverage the spatial-domain correspondence between the visual modality (an image) and various textual segments to extract fine-grained feature representations for classification.
We evaluate MHA-MEME on the 'Memotion Analysis' dataset for all three sub-tasks - sentiment classification, affect classification, and affect class quantification.
arXiv Detail & Related papers (2021-03-23T08:21:37Z) - gundapusunil at SemEval-2020 Task 8: Multimodal Memotion Analysis [7.538482310185133]
We present a multi-modal sentiment analysis system using deep neural networks combining Computer Vision and Natural Language Processing.
Our aim is different than the normal sentiment analysis goal of predicting whether a text expresses positive or negative sentiment.
Our system has been developed using CNN and LSTM and outperformed the baseline score.
arXiv Detail & Related papers (2020-10-09T09:53:14Z) - SemEval-2020 Task 8: Memotion Analysis -- The Visuo-Lingual Metaphor! [20.55903557920223]
The objective of this proposal is to bring the attention of the research community towards the automatic processing of Internet memes.
The task Memotion analysis released approx 10K annotated memes, with human-annotated labels namely sentiment (positive, negative, neutral), type of emotion (sarcastic, funny, offensive, motivation) and corresponding intensity.
The challenge consisted of three subtasks: sentiment (positive, negative, and neutral) analysis of memes, overall emotion (humour, sarcasm, offensive, and motivational) classification of memes, and classifying intensity of meme emotion.
arXiv Detail & Related papers (2020-08-09T18:17:33Z) - DSC IIT-ISM at SemEval-2020 Task 8: Bi-Fusion Techniques for Deep Meme
Emotion Analysis [5.259920715958942]
This paper presents our work on theMemotion Analysis shared task of SemEval 2020.
We propose a system which uses different bimodal fusion techniques toleverage the inter-modal dependency for sentiment and humor classification tasks.
arXiv Detail & Related papers (2020-07-28T17:23:35Z) - YNU-HPCC at SemEval-2020 Task 8: Using a Parallel-Channel Model for
Memotion Analysis [11.801902984731129]
This paper proposes a parallel-channel model to process the textual and visual information in memes.
In the shared task of identifying and categorizing memes, we preprocess the dataset according to the language behaviors on social media.
We then adapt and fine-tune the Bidirectional Representations from Transformers (BERT), and two types of convolutional neural network models (CNNs) were used to extract the features from the pictures.
arXiv Detail & Related papers (2020-07-28T03:20:31Z) - EmotiCon: Context-Aware Multimodal Emotion Recognition using Frege's
Principle [71.47160118286226]
We present EmotiCon, a learning-based algorithm for context-aware perceived human emotion recognition from videos and images.
Motivated by Frege's Context Principle from psychology, our approach combines three interpretations of context for emotion recognition.
We report an Average Precision (AP) score of 35.48 across 26 classes, which is an improvement of 7-8 over prior methods.
arXiv Detail & Related papers (2020-03-14T19:55:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.