UPB at SemEval-2020 Task 8: Joint Textual and Visual Modeling in a
Multi-Task Learning Architecture for Memotion Analysis
- URL: http://arxiv.org/abs/2009.02779v2
- Date: Tue, 10 Nov 2020 17:46:34 GMT
- Title: UPB at SemEval-2020 Task 8: Joint Textual and Visual Modeling in a
Multi-Task Learning Architecture for Memotion Analysis
- Authors: George-Alexandru Vlad, George-Eduard Zaharia, Dumitru-Clementin
Cercel, Costin-Gabriel Chiru, Stefan Trausan-Matu
- Abstract summary: We describe the system developed by our team for SemEval-2020 Task 8: Memotion Analysis.
We introduce a novel system to analyze these posts, a multimodal multi-task learning architecture that combines ALBERT for text encoding with VGG-16 for image representation.
Our approach achieves good performance on each of the three subtasks of the current competition, ranking 11th for Subtask A (0.3453 macro F1-score), 1st for Subtask B (0.5183 macro F1-score), and 3rd for Subtask C (0.3171 macro F1-score)
- Score: 1.2233362977312945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Users from the online environment can create different ways of expressing
their thoughts, opinions, or conception of amusement. Internet memes were
created specifically for these situations. Their main purpose is to transmit
ideas by using combinations of images and texts such that they will create a
certain state for the receptor, depending on the message the meme has to send.
These posts can be related to various situations or events, thus adding a funny
side to any circumstance our world is situated in. In this paper, we describe
the system developed by our team for SemEval-2020 Task 8: Memotion Analysis.
More specifically, we introduce a novel system to analyze these posts, a
multimodal multi-task learning architecture that combines ALBERT for text
encoding with VGG-16 for image representation. In this manner, we show that the
information behind them can be properly revealed. Our approach achieves good
performance on each of the three subtasks of the current competition, ranking
11th for Subtask A (0.3453 macro F1-score), 1st for Subtask B (0.5183 macro
F1-score), and 3rd for Subtask C (0.3171 macro F1-score) while exceeding the
official baseline results by high margins.
Related papers
- A Unified Framework for 3D Scene Understanding [50.6762892022386]
UniSeg3D is a unified 3D segmentation framework that achieves panoptic, semantic, instance, interactive, referring, and open-vocabulary semantic segmentation tasks within a single model.
It facilitates inter-task knowledge sharing and promotes comprehensive 3D scene understanding.
Experiments on three benchmarks, including the ScanNet20, ScanRefer, and ScanNet200, demonstrate that the UniSeg3D consistently outperforms current SOTA methods.
arXiv Detail & Related papers (2024-07-03T16:50:07Z) - MIPS at SemEval-2024 Task 3: Multimodal Emotion-Cause Pair Extraction in Conversations with Multimodal Language Models [13.137392771279742]
This paper presents our winning submission to Subtask 2 of SemEval 2024 Task 3 on multimodal emotion cause analysis in conversations.
We propose a novel Multimodal Emotion Recognition and Multimodal Emotion Cause Extraction framework that integrates text, audio, and visual modalities.
arXiv Detail & Related papers (2024-03-31T01:16:02Z) - NYCU-TWO at Memotion 3: Good Foundation, Good Teacher, then you have
Good Meme Analysis [4.361904115604854]
This paper presents a robust solution to the Memotion 3.0 Shared Task.
The goal of this task is to classify the emotion and the corresponding intensity expressed by memes.
Understanding the multi-modal features of the given memes will be the key to solving the task.
arXiv Detail & Related papers (2023-02-13T03:25:37Z) - Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark [80.79082788458602]
We provide a new multi-task benchmark for evaluating text-to-image models.
We compare the most common open-source (Stable Diffusion) and commercial (DALL-E 2) models.
Twenty computer science AI graduate students evaluated the two models, on three tasks, at three difficulty levels, across ten prompts each.
arXiv Detail & Related papers (2022-11-22T09:27:53Z) - VIMA: General Robot Manipulation with Multimodal Prompts [82.01214865117637]
We show that a wide spectrum of robot manipulation tasks can be expressed with multimodal prompts.
We develop a new simulation benchmark that consists of thousands of procedurally-generated tabletop tasks.
We design a transformer-based robot agent, VIMA, that processes these prompts and outputs motor actions autoregressively.
arXiv Detail & Related papers (2022-10-06T17:50:11Z) - Unifying Architectures, Tasks, and Modalities Through a Simple
Sequence-to-Sequence Learning Framework [83.82026345508334]
We propose OFA, a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image generation, visual grounding, image captioning, image classification, text generation, etc.)
OFA achieves new state-of-the-arts on a series of multimodal tasks, including image captioning (COCO test CIDEr: 149.6), text-to-image generation (COCO test FID: 10.5), VQA (test-std encoder acc.: 80.02), SNLI-VE (test acc.: 90.
arXiv Detail & Related papers (2022-02-07T10:38:21Z) - A Shared Representation for Photorealistic Driving Simulators [83.5985178314263]
We propose to improve the quality of generated images by rethinking the discriminator architecture.
The focus is on the class of problems where images are generated given semantic inputs, such as scene segmentation maps or human body poses.
We aim to learn a shared latent representation that encodes enough information to jointly do semantic segmentation, content reconstruction, along with a coarse-to-fine grained adversarial reasoning.
arXiv Detail & Related papers (2021-12-09T18:59:21Z) - SemEval-2021 Task 4: Reading Comprehension of Abstract Meaning [47.49596196559958]
This paper introduces the SemEval-2021 shared task 4: Reading of Abstract Meaning (ReCAM)
Given a passage and the corresponding question, a participating system is expected to choose the correct answer from five candidates of abstract concepts.
Subtask 1 aims to evaluate how well a system can model concepts that cannot be directly perceived in the physical world.
Subtask 2 focuses on models' ability in comprehending nonspecific concepts located high in a hypernym hierarchy.
Subtask 3 aims to provide some insights into models' generalizability over the two types of abstractness.
arXiv Detail & Related papers (2021-05-31T11:04:17Z) - DSC IIT-ISM at SemEval-2020 Task 8: Bi-Fusion Techniques for Deep Meme
Emotion Analysis [5.259920715958942]
This paper presents our work on theMemotion Analysis shared task of SemEval 2020.
We propose a system which uses different bimodal fusion techniques toleverage the inter-modal dependency for sentiment and humor classification tasks.
arXiv Detail & Related papers (2020-07-28T17:23:35Z) - YNU-HPCC at SemEval-2020 Task 8: Using a Parallel-Channel Model for
Memotion Analysis [11.801902984731129]
This paper proposes a parallel-channel model to process the textual and visual information in memes.
In the shared task of identifying and categorizing memes, we preprocess the dataset according to the language behaviors on social media.
We then adapt and fine-tune the Bidirectional Representations from Transformers (BERT), and two types of convolutional neural network models (CNNs) were used to extract the features from the pictures.
arXiv Detail & Related papers (2020-07-28T03:20:31Z) - IITK at SemEval-2020 Task 8: Unimodal and Bimodal Sentiment Analysis of
Internet Memes [2.2385755093672044]
We present our approaches for the Memotion Analysis problem as posed in SemEval-2020 Task 8.
The goal of this task is to classify memes based on their emotional content and sentiment.
Our results show that a text-only approach, a simple Feed Forward Neural Network (FFNN) with Word2vec embeddings as input, performs superior to all the others.
arXiv Detail & Related papers (2020-07-21T14:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.