Codec at SemEval-2022 Task 5: Multi-Modal Multi-Transformer Misogynous
Meme Classification Framework
- URL: http://arxiv.org/abs/2206.07190v1
- Date: Tue, 14 Jun 2022 22:37:25 GMT
- Title: Codec at SemEval-2022 Task 5: Multi-Modal Multi-Transformer Misogynous
Meme Classification Framework
- Authors: Ahmed Mahran, Carlo Alessandro Borella, Konstantinos Perifanos
- Abstract summary: We describe our work towards building a generic framework for both multi-modal embedding and multi-label binary classification tasks.
We are participating in task 5 (Multimedia Automatic Misogyny Identification) of SemEval 2022 competition.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this paper we describe our work towards building a generic framework for
both multi-modal embedding and multi-label binary classification tasks, while
participating in task 5 (Multimedia Automatic Misogyny Identification) of
SemEval 2022 competition.
Since pretraining deep models from scratch is a resource and data hungry
task, our approach is based on three main strategies. We combine different
state-of-the-art architectures to capture a wide spectrum of semantic signals
from the multi-modal input. We employ a multi-task learning scheme to be able
to use multiple datasets from the same knowledge domain to help increase the
model's performance. We also use multiple objectives to regularize and fine
tune different system components.
Related papers
- M&M: Multimodal-Multitask Model Integrating Audiovisual Cues in Cognitive Load Assessment [0.0]
This paper introduces the M&M model, a novel multimodal-multitask learning framework, applied to the AVCAffe dataset for cognitive load assessment.
M&M uniquely integrates audiovisual cues through a dual-pathway architecture, featuring specialized streams for audio and video inputs.
A key innovation lies in its cross-modality multihead attention mechanism, fusing the different modalities for synchronized multitasking.
arXiv Detail & Related papers (2024-03-14T14:49:40Z) - Multi-modal Semantic Understanding with Contrastive Cross-modal Feature
Alignment [11.897888221717245]
This paper proposes a novel CLIP-guided contrastive-learning-based architecture to perform multi-modal feature alignment.
Our model is simple to implement without using task-specific external knowledge, and thus can easily migrate to other multi-modal tasks.
arXiv Detail & Related papers (2024-03-11T01:07:36Z) - Generative Multimodal Models are In-Context Learners [60.50927925426832]
We introduce Emu2, a generative multimodal model with 37 billion parameters, trained on large-scale multimodal sequences.
Emu2 exhibits strong multimodal in-context learning abilities, even emerging to solve tasks that require on-the-fly reasoning.
arXiv Detail & Related papers (2023-12-20T18:59:58Z) - An Efficient General-Purpose Modular Vision Model via Multi-Task
Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.
Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z) - OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist
Models [72.8156832931841]
Generalist models are capable of performing diverse multi-modal tasks in a task-agnostic way within a single model.
We release a generalist model learning system, OFASys, built on top of a declarative task interface named multi-modal instruction.
arXiv Detail & Related papers (2022-12-08T17:07:09Z) - Multi-scale Cooperative Multimodal Transformers for Multimodal Sentiment
Analysis in Videos [58.93586436289648]
We propose a multi-scale cooperative multimodal transformer (MCMulT) architecture for multimodal sentiment analysis.
Our model outperforms existing approaches on unaligned multimodal sequences and has strong performance on aligned multimodal sequences.
arXiv Detail & Related papers (2022-06-16T07:47:57Z) - Revisit Multimodal Meta-Learning through the Lens of Multi-Task Learning [33.19179706038397]
Multimodal meta-learning is a recent problem that extends conventional few-shot meta-learning by generalizing its setup to diverse multimodal task distributions.
Previous work claims that a single meta-learner trained on a multimodal distribution can sometimes outperform multiple specialized meta-learners trained on individual unimodal distributions.
Our work makes two contributions to multimodal meta-learning. First, we propose a method to quantify knowledge transfer between tasks of different modes at a micro-level.
Second, inspired by hard parameter sharing in multi-task learning and a new interpretation of related work, we propose a new multimodal meta-learn
arXiv Detail & Related papers (2021-10-27T06:23:45Z) - Unsupervised Multimodal Language Representations using Convolutional
Autoencoders [5.464072883537924]
We propose extracting unsupervised Multimodal Language representations that are universal and can be applied to different tasks.
We map the word-level aligned multimodal sequences to 2-D matrices and then use Convolutional Autoencoders to learn embeddings by combining multiple datasets.
It is also shown that our method is extremely lightweight and can be easily generalized to other tasks and unseen data with small performance drop and almost the same number of parameters.
arXiv Detail & Related papers (2021-10-06T18:28:07Z) - Team Neuro at SemEval-2020 Task 8: Multi-Modal Fine Grain Emotion
Classification of Memes using Multitask Learning [7.145975932644256]
We describe the system that we used for the memotion analysis challenge, which is Task 8 of SemEval-2020.
This challenge had three subtasks where affect based sentiment classification of the memes was required along with intensities.
The system we proposed combines the three tasks into a single one by representing it as multi-label hierarchical classification problem.
arXiv Detail & Related papers (2020-05-21T21:29:44Z) - Deep Multimodal Neural Architecture Search [178.35131768344246]
We devise a generalized deep multimodal neural architecture search (MMnas) framework for various multimodal learning tasks.
Given multimodal input, we first define a set of primitive operations, and then construct a deep encoder-decoder based unified backbone.
On top of the unified backbone, we attach task-specific heads to tackle different multimodal learning tasks.
arXiv Detail & Related papers (2020-04-25T07:00:32Z) - MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning [82.62433731378455]
We show that tasks with high affinity at a certain scale are not guaranteed to retain this behaviour at other scales.
We propose a novel architecture, namely MTI-Net, that builds upon this finding.
arXiv Detail & Related papers (2020-01-19T21:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.