Multimodal Feature Extraction for Memes Sentiment Classification
- URL: http://arxiv.org/abs/2207.03317v1
- Date: Thu, 7 Jul 2022 14:21:52 GMT
- Title: Multimodal Feature Extraction for Memes Sentiment Classification
- Authors: Sofiane Ouaari, Tsegaye Misikir Tashu, Tomas Horvath
- Abstract summary: We propose feature extraction for multimodal meme classification using Deep Learning approaches.
A meme is usually a photo or video with text shared by the young generation on social media platforms that expresses a culturally relevant idea.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this study, we propose feature extraction for multimodal meme
classification using Deep Learning approaches. A meme is usually a photo or
video with text shared by the young generation on social media platforms that
expresses a culturally relevant idea. Since they are an efficient way to
express emotions and feelings, a good classifier that can classify the
sentiment behind the meme is important. To make the learning process more
efficient, reduce the likelihood of overfitting, and improve the
generalizability of the model, one needs a good approach for joint feature
extraction from all modalities. In this work, we proposed to use different
multimodal neural network approaches for multimodal feature extraction and use
the extracted features to train a classifier to identify the sentiment in a
meme.
Related papers
- XMeCap: Meme Caption Generation with Sub-Image Adaptability [53.2509590113364]
Humor, deeply rooted in societal meanings and cultural details, poses a unique challenge for machines.
We introduce the textscXMeCap framework, which adopts supervised fine-tuning and reinforcement learning.
textscXMeCap achieves an average evaluation score of 75.85 for single-image memes and 66.32 for multi-image memes, outperforming the best baseline by 3.71% and 4.82%, respectively.
arXiv Detail & Related papers (2024-07-24T10:51:46Z) - TextAug: Test time Text Augmentation for Multimodal Person
Re-identification [8.557492202759711]
The bottleneck for multimodal deep learning is the need for a large volume of multimodal training examples.
Data augmentation techniques such as cropping, flipping, rotation, etc. are often employed in the image domain to improve the generalization of deep learning models.
In this study, we investigate the effectiveness of two computer vision data augmentation techniques: cutout and cutmix, for text augmentation in multi-modal person re-identification.
arXiv Detail & Related papers (2023-12-04T03:38:04Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - EMERSK -- Explainable Multimodal Emotion Recognition with Situational
Knowledge [0.0]
We present Explainable Multimodal Emotion Recognition with Situational Knowledge (EMERSK)
EMERSK is a general system for human emotion recognition and explanation using visual information.
Our system can handle multiple modalities, including facial expressions, posture, and gait in a flexible and modular manner.
arXiv Detail & Related papers (2023-06-14T17:52:37Z) - SemiMemes: A Semi-supervised Learning Approach for Multimodal Memes
Analysis [0.0]
SemiMemes is a novel training method that combines auto-encoder and classification task to make use of the resourceful unlabeled data.
This research proposes a multimodal semi-supervised learning approach that outperforms other multimodal semi-supervised learning and supervised learning state-of-the-art models.
arXiv Detail & Related papers (2023-03-31T11:22:03Z) - What do you MEME? Generating Explanations for Visual Semantic Role
Labelling in Memes [42.357272117919464]
We introduce a novel task - EXCLAIM, generating explanations for visual semantic role labeling in memes.
To this end, we curate ExHVV, a novel dataset that offers natural language explanations of connotative roles for three types of entities.
We also posit LUMEN, a novel multimodal, multi-task learning framework that endeavors to address EXCLAIM optimally.
arXiv Detail & Related papers (2022-12-01T18:21:36Z) - Hate-CLIPper: Multimodal Hateful Meme Classification based on
Cross-modal Interaction of CLIP Features [5.443781798915199]
Hateful memes are a growing menace on social media.
detecting hateful memes requires careful consideration of both visual and textual information.
We propose the Hate-CLIPper architecture, which explicitly models the cross-modal interactions between the image and text representations.
arXiv Detail & Related papers (2022-10-12T04:34:54Z) - Multi-Modal Few-Shot Object Detection with Meta-Learning-Based
Cross-Modal Prompting [77.69172089359606]
We study multi-modal few-shot object detection (FSOD) in this paper, using both few-shot visual examples and class semantic information for detection.
Our approach is motivated by the high-level conceptual similarity of (metric-based) meta-learning and prompt-based learning.
We comprehensively evaluate the proposed multi-modal FSOD models on multiple few-shot object detection benchmarks, achieving promising results.
arXiv Detail & Related papers (2022-04-16T16:45:06Z) - Routing with Self-Attention for Multimodal Capsule Networks [108.85007719132618]
We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework.
To adapt the capsules to large-scale input data, we propose a novel routing by self-attention mechanism that selects relevant capsules.
This allows not only for robust training with noisy video data, but also to scale up the size of the capsule network compared to traditional routing methods.
arXiv Detail & Related papers (2021-12-01T19:01:26Z) - Distilling Audio-Visual Knowledge by Compositional Contrastive Learning [51.20935362463473]
We learn a compositional embedding that closes the cross-modal semantic gap.
We establish a new, comprehensive multi-modal distillation benchmark on three video datasets.
arXiv Detail & Related papers (2021-04-22T09:31:20Z) - Attentive CutMix: An Enhanced Data Augmentation Approach for Deep
Learning Based Image Classification [58.20132466198622]
We propose Attentive CutMix, a naturally enhanced augmentation strategy based on CutMix.
In each training iteration, we choose the most descriptive regions based on the intermediate attention maps from a feature extractor.
Our proposed method is simple yet effective, easy to implement and can boost the baseline significantly.
arXiv Detail & Related papers (2020-03-29T15:01:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.