NLP-CUET@DravidianLangTech-EACL2021: Investigating Visual and Textual
Features to Identify Trolls from Multimodal Social Media Memes
- URL: http://arxiv.org/abs/2103.00466v1
- Date: Sun, 28 Feb 2021 11:36:50 GMT
- Title: NLP-CUET@DravidianLangTech-EACL2021: Investigating Visual and Textual
Features to Identify Trolls from Multimodal Social Media Memes
- Authors: Eftekhar Hossain, Omar Sharif, Mohammed Moshiul Hoque
- Abstract summary: A shared task is organized to develop models that can identify trolls from multimodal social media memes.
This work presents a computational model that we have developed as part of our participation in the task.
We investigated the visual and textual features using CNN, VGG16, Inception, Multilingual-BERT, XLM-Roberta, XLNet models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In the past few years, the meme has become a new way of communication on the
Internet. As memes are the images with embedded text, it can quickly spread
hate, offence and violence. Classifying memes are very challenging because of
their multimodal nature and region-specific interpretation. A shared task is
organized to develop models that can identify trolls from multimodal social
media memes. This work presents a computational model that we have developed as
part of our participation in the task. Training data comes in two forms: an
image with embedded Tamil code-mixed text and an associated caption given in
English. We investigated the visual and textual features using CNN, VGG16,
Inception, Multilingual-BERT, XLM-Roberta, XLNet models. Multimodal features
are extracted by combining image (CNN, ResNet50, Inception) and text (Long
short term memory network) features via early fusion approach. Results indicate
that the textual approach with XLNet achieved the highest weighted $f_1$-score
of $0.58$, which enabled our model to secure $3^{rd}$ rank in this task.
Related papers
- XMeCap: Meme Caption Generation with Sub-Image Adaptability [53.2509590113364]
Humor, deeply rooted in societal meanings and cultural details, poses a unique challenge for machines.
We introduce the textscXMeCap framework, which adopts supervised fine-tuning and reinforcement learning.
textscXMeCap achieves an average evaluation score of 75.85 for single-image memes and 66.32 for multi-image memes, outperforming the best baseline by 3.71% and 4.82%, respectively.
arXiv Detail & Related papers (2024-07-24T10:51:46Z) - OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text [112.60163342249682]
We introduce OmniCorpus, a 10 billion-scale image-text interleaved dataset.
Our dataset has 15 times larger scales while maintaining good data quality.
We hope this could provide a solid data foundation for future multimodal model research.
arXiv Detail & Related papers (2024-06-12T17:01:04Z) - Generating Images with Multimodal Language Models [78.6660334861137]
We propose a method to fuse frozen text-only large language models with pre-trained image encoder and decoder models.
Our model demonstrates a wide suite of multimodal capabilities: image retrieval, novel image generation, and multimodal dialogue.
arXiv Detail & Related papers (2023-05-26T19:22:03Z) - Hate-CLIPper: Multimodal Hateful Meme Classification based on
Cross-modal Interaction of CLIP Features [5.443781798915199]
Hateful memes are a growing menace on social media.
detecting hateful memes requires careful consideration of both visual and textual information.
We propose the Hate-CLIPper architecture, which explicitly models the cross-modal interactions between the image and text representations.
arXiv Detail & Related papers (2022-10-12T04:34:54Z) - On Advances in Text Generation from Images Beyond Captioning: A Case
Study in Self-Rationalization [89.94078728495423]
We show that recent advances in each modality, CLIP image representations and scaling of language models, do not consistently improve multimodal self-rationalization of tasks with multimodal inputs.
Our findings call for a backbone modelling approach that can be built on to advance text generation from images and text beyond image captioning.
arXiv Detail & Related papers (2022-05-24T00:52:40Z) - Hate Me Not: Detecting Hate Inducing Memes in Code Switched Languages [1.376408511310322]
In countries like India, where multiple languages are spoken, these abhorrent posts are from an unusual blend of code-switched languages.
This hate speech is depicted with the help of images to form "Memes" which create a long-lasting impact on the human mind.
We take up the task of hate and offense detection from multimodal data, i.e. images (Memes) that contain text in code-switched languages.
arXiv Detail & Related papers (2022-04-24T21:03:57Z) - Do Images really do the Talking? Analysing the significance of Images in
Tamil Troll meme classification [0.16863755729554888]
We try to explore the significance of visual features of images in classifying memes.
We try to incorporate the memes as troll and non-trolling memes based on the images and the text on them.
arXiv Detail & Related papers (2021-08-09T09:04:42Z) - Exploiting BERT For Multimodal Target SentimentClassification Through
Input Space Translation [75.82110684355979]
We introduce a two-stream model that translates images in input space using an object-aware transformer.
We then leverage the translation to construct an auxiliary sentence that provides multimodal information to a language model.
We achieve state-of-the-art performance on two multimodal Twitter datasets.
arXiv Detail & Related papers (2021-08-03T18:02:38Z) - Detecting Hate Speech in Multi-modal Memes [14.036769355498546]
We focus on hate speech detection in multi-modal memes wherein memes pose an interesting multi-modal fusion problem.
We aim to solve the Facebook Meme Challenge citekiela 2020hateful which aims to solve a binary classification problem of predicting whether a meme is hateful or not.
arXiv Detail & Related papers (2020-12-29T18:30:00Z) - YNU-HPCC at SemEval-2020 Task 8: Using a Parallel-Channel Model for
Memotion Analysis [11.801902984731129]
This paper proposes a parallel-channel model to process the textual and visual information in memes.
In the shared task of identifying and categorizing memes, we preprocess the dataset according to the language behaviors on social media.
We then adapt and fine-tune the Bidirectional Representations from Transformers (BERT), and two types of convolutional neural network models (CNNs) were used to extract the features from the pictures.
arXiv Detail & Related papers (2020-07-28T03:20:31Z) - Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene
Text [93.08109196909763]
We propose a novel VQA approach, Multi-Modal Graph Neural Network (MM-GNN)
It first represents an image as a graph consisting of three sub-graphs, depicting visual, semantic, and numeric modalities respectively.
It then introduces three aggregators which guide the message passing from one graph to another to utilize the contexts in various modalities.
arXiv Detail & Related papers (2020-03-31T05:56:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.