AMS_ADRN at SemEval-2022 Task 5: A Suitable Image-text Multimodal Joint
Modeling Method for Multi-task Misogyny Identification
- URL: http://arxiv.org/abs/2202.09099v1
- Date: Fri, 18 Feb 2022 09:41:37 GMT
- Title: AMS_ADRN at SemEval-2022 Task 5: A Suitable Image-text Multimodal Joint
Modeling Method for Multi-task Misogyny Identification
- Authors: Da Li, Ming Yi, Yukai He
- Abstract summary: Women are influential online, especially in image-based social media such as Twitter and Instagram.
In this paper, we describe the system developed by our team for SemEval-2022 Task 5: Multimedia Automatic Misogyny Identification.
- Score: 3.5382535469099436
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Women are influential online, especially in image-based social media such as
Twitter and Instagram. However, many in the network environment contain gender
discrimination and aggressive information, which magnify gender stereotypes and
gender inequality. Therefore, the filtering of illegal content such as gender
discrimination is essential to maintain a healthy social network environment.
In this paper, we describe the system developed by our team for SemEval-2022
Task 5: Multimedia Automatic Misogyny Identification. More specifically, we
introduce two novel system to analyze these posts: a multimodal multi-task
learning architecture that combines Bertweet for text encoding with ResNet-18
for image representation, and a single-flow transformer structure which
combines text embeddings from BERT-Embeddings and image embeddings from several
different modules such as EfficientNet and ResNet. In this manner, we show that
the information behind them can be properly revealed. Our approach achieves
good performance on each of the two subtasks of the current competition,
ranking 15th for Subtask A (0.746 macro F1-score), 11th for Subtask B (0.706
macro F1-score) while exceeding the official baseline results by high margins.
Related papers
- Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation [90.71613903956451]
Text-to-image retrieval is a fundamental task in multimedia processing.
We propose an autoregressive voken generation method, named AVG.
We show that AVG achieves superior results in both effectiveness and efficiency.
arXiv Detail & Related papers (2024-07-24T13:39:51Z) - Instruct-Imagen: Image Generation with Multi-modal Instruction [90.04481955523514]
instruct-imagen is a model that tackles heterogeneous image generation tasks and generalizes across unseen tasks.
We introduce *multi-modal instruction* for image generation, a task representation articulating a range of generation intents with precision.
Human evaluation on various image generation datasets reveals that instruct-imagen matches or surpasses prior task-specific models in-domain.
arXiv Detail & Related papers (2024-01-03T19:31:58Z) - KOSMOS-2.5: A Multimodal Literate Model [136.96172068766285]
We present KOSMOS-2.5, a multimodal literate model for machine reading of text-intensive images.
KOSMOS-2.5 excels in two distinct yet complementary transcription tasks.
We fine-tune KOSMOS-2.5 for document understanding tasks, resulting in a document understanding generalist named KOSMOS-2.5-CHAT.
arXiv Detail & Related papers (2023-09-20T15:50:08Z) - DiffDis: Empowering Generative Diffusion Model with Cross-Modal
Discrimination Capability [75.9781362556431]
We propose DiffDis to unify the cross-modal generative and discriminative pretraining into one single framework under the diffusion process.
We show that DiffDis outperforms single-task models on both the image generation and the image-text discriminative tasks.
arXiv Detail & Related papers (2023-08-18T05:03:48Z) - CL-UZH at SemEval-2023 Task 10: Sexism Detection through Incremental
Fine-Tuning and Multi-Task Learning with Label Descriptions [0.0]
SemEval shared task textitTowards Explainable Detection of Online Sexism (EDOS 2023) is to detect sexism in English social media posts.
We present our submitted systems for all three subtasks, based on a multi-task model that has been fine-tuned on a range of related tasks.
We implement multi-task learning by formulating each task as binary pairwise text classification, where the dataset and label descriptions are given along with the input text.
arXiv Detail & Related papers (2023-06-06T17:59:49Z) - Auditing Gender Presentation Differences in Text-to-Image Models [54.16959473093973]
We study how gender is presented differently in text-to-image models.
By probing gender indicators in the input text, we quantify the frequency differences of presentation-centric attributes.
We propose an automatic method to estimate such differences.
arXiv Detail & Related papers (2023-02-07T18:52:22Z) - Codec at SemEval-2022 Task 5: Multi-Modal Multi-Transformer Misogynous
Meme Classification Framework [0.0]
We describe our work towards building a generic framework for both multi-modal embedding and multi-label binary classification tasks.
We are participating in task 5 (Multimedia Automatic Misogyny Identification) of SemEval 2022 competition.
arXiv Detail & Related papers (2022-06-14T22:37:25Z) - UPB at SemEval-2022 Task 5: Enhancing UNITER with Image Sentiment and
Graph Convolutional Networks for Multimedia Automatic Misogyny Identification [0.3437656066916039]
We describe our classification systems submitted to the SemEval-2022 Task 5: MAMI - Multimedia Automatic Misogyny Identification.
Our best model reaches an F1-score of 71.4% in Sub-task A and 67.3% for Sub-task B positioning our team in the upper third of the leaderboard.
arXiv Detail & Related papers (2022-05-29T21:12:36Z) - TIB-VA at SemEval-2022 Task 5: A Multimodal Architecture for the
Detection and Classification of Misogynous Memes [9.66022279280394]
We present a multimodal architecture that combines textual and visual features in order to detect misogynous meme content.
Our solution obtained the best result in the Task-B where the challenge is to classify whether a given document is misogynous.
arXiv Detail & Related papers (2022-04-13T11:03:21Z) - RubCSG at SemEval-2022 Task 5: Ensemble learning for identifying
misogynous MEMEs [12.979213013465882]
This work presents an ensemble system based on various uni-modal and bi-modal model architectures developed for the SemEval 2022 Task 5: MAMI-Multimedia Automatic Misogyny Identification.
arXiv Detail & Related papers (2022-04-08T09:27:28Z) - UFO: A UniFied TransfOrmer for Vision-Language Representation Learning [54.82482779792115]
We propose a single UniFied transfOrmer (UFO) capable of processing either unimodal inputs (e.g., image or language) or multimodal inputs (e.g., the concatenation of the image and the question) for vision-language (VL) representation learning.
Existing approaches typically design an individual network for each modality and/or a specific fusion network for multimodal tasks.
arXiv Detail & Related papers (2021-11-19T03:23:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.