SemiMemes: A Semi-supervised Learning Approach for Multimodal Memes
Analysis
- URL: http://arxiv.org/abs/2304.00020v2
- Date: Tue, 16 May 2023 07:19:23 GMT
- Title: SemiMemes: A Semi-supervised Learning Approach for Multimodal Memes
Analysis
- Authors: Pham Thai Hoang Tung, Nguyen Tan Viet, Ngo Tien Anh, Phan Duy Hung
- Abstract summary: SemiMemes is a novel training method that combines auto-encoder and classification task to make use of the resourceful unlabeled data.
This research proposes a multimodal semi-supervised learning approach that outperforms other multimodal semi-supervised learning and supervised learning state-of-the-art models.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The prevalence of memes on social media has created the need to sentiment
analyze their underlying meanings for censoring harmful content. Meme censoring
systems by machine learning raise the need for a semi-supervised learning
solution to take advantage of the large number of unlabeled memes available on
the internet and make the annotation process less challenging. Moreover, the
approach needs to utilize multimodal data as memes' meanings usually come from
both images and texts. This research proposes a multimodal semi-supervised
learning approach that outperforms other multimodal semi-supervised learning
and supervised learning state-of-the-art models on two datasets, the Multimedia
Automatic Misogyny Identification and Hateful Memes dataset. Building on the
insights gained from Contrastive Language-Image Pre-training, which is an
effective multimodal learning technique, this research introduces SemiMemes, a
novel training method that combines auto-encoder and classification task to
make use of the resourceful unlabeled data.
Related papers
- Exploiting Minority Pseudo-Labels for Semi-Supervised Semantic Segmentation in Autonomous Driving [2.638145329894673]
We propose a professional training module to enhance minority class learning and a general training module to learn more comprehensive semantic information.
In experiments, our framework demonstrates superior performance compared to state-of-the-art methods on benchmark datasets.
arXiv Detail & Related papers (2024-09-19T11:47:25Z) - Detecting and Grounding Multi-Modal Media Manipulation and Beyond [93.08116982163804]
We highlight a new research problem for multi-modal fake media, namely Detecting and Grounding Multi-Modal Media Manipulation (DGM4)
DGM4 aims to not only detect the authenticity of multi-modal media, but also ground the manipulated content.
We propose a novel HierArchical Multi-modal Manipulation rEasoning tRansformer (HAMMER) to fully capture the fine-grained interaction between different modalities.
arXiv Detail & Related papers (2023-09-25T15:05:46Z) - Unimodal Intermediate Training for Multimodal Meme Sentiment
Classification [0.0]
We present a novel variant of supervised intermediate training that uses relatively abundant sentiment-labelled unimodal data.
Our results show a statistically significant performance improvement from the incorporation of unimodal text data.
We show that the training set of labelled memes can be reduced by 40% without reducing the performance of the downstream model.
arXiv Detail & Related papers (2023-08-01T13:14:10Z) - Multi-Modal Representation Learning with Text-Driven Soft Masks [48.19806080407593]
We propose a visual-linguistic representation learning approach within a self-supervised learning framework.
We generate diverse features for the image-text matching (ITM) task via soft-masking the regions in an image.
We identify the relevant regions to each word by computing the word-conditional visual attention using multi-modal encoder.
arXiv Detail & Related papers (2023-04-03T05:07:49Z) - Vision Learners Meet Web Image-Text Pairs [32.36188289972377]
In this work, we consider self-supervised pre-training on noisy web sourced image-text paired data.
We compare a range of methods, including single-modal ones that use masked training objectives and multi-modal ones that use image-text constrastive training.
We present a new visual representation pre-training method, MUlti-modal Generator(MUG), that learns from scalable web sourced image-text data.
arXiv Detail & Related papers (2023-01-17T18:53:24Z) - Multimodal Masked Autoencoders Learn Transferable Representations [127.35955819874063]
We propose a simple and scalable network architecture, the Multimodal Masked Autoencoder (M3AE)
M3AE learns a unified encoder for both vision and language data via masked token prediction.
We provide an empirical study of M3AE trained on a large-scale image-text dataset, and find that M3AE is able to learn generalizable representations that transfer well to downstream tasks.
arXiv Detail & Related papers (2022-05-27T19:09:42Z) - Meta-Learning and Self-Supervised Pretraining for Real World Image
Translation [5.469808405577674]
We explore image-to-image translation problem in order to formulate a novel multi-task few-shot image generation benchmark.
We present several baselines for the few-shotiru problem and discuss trade-offs between different approaches.
arXiv Detail & Related papers (2021-12-22T14:48:22Z) - MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal
Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition.
Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction.
Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z) - Multimodal Clustering Networks for Self-supervised Learning from
Unlabeled Videos [69.61522804742427]
This paper proposes a self-supervised training framework that learns a common multimodal embedding space.
We extend the concept of instance-level contrastive learning with a multimodal clustering step to capture semantic similarities across modalities.
The resulting embedding space enables retrieval of samples across all modalities, even from unseen datasets and different domains.
arXiv Detail & Related papers (2021-04-26T15:55:01Z) - Provable Meta-Learning of Linear Representations [114.656572506859]
We provide fast, sample-efficient algorithms to address the dual challenges of learning a common set of features from multiple, related tasks, and transferring this knowledge to new, unseen tasks.
We also provide information-theoretic lower bounds on the sample complexity of learning these linear features.
arXiv Detail & Related papers (2020-02-26T18:21:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.