Leveraging Contrastive Learning and Self-Training for Multimodal Emotion Recognition with Limited Labeled Samples
- URL: http://arxiv.org/abs/2409.04447v1
- Date: Fri, 23 Aug 2024 11:33:54 GMT
- Title: Leveraging Contrastive Learning and Self-Training for Multimodal Emotion Recognition with Limited Labeled Samples
- Authors: Qi Fan, Yutong Li, Yi Xin, Xinyu Cheng, Guanglai Gao, Miao Ma,
- Abstract summary: We present our submission solutions for the Semi-Supervised Learning Sub-Challenge (MER2024-SEMI)
This challenge tackles the issue of limited annotated data in emotion recognition.
Our proposed method is validated to be effective on the MER2024-SEMI Challenge, achieving a weighted average F-score of 88.25% and ranking 6th on the leaderboard.
- Score: 18.29910296652917
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Multimodal Emotion Recognition challenge MER2024 focuses on recognizing emotions using audio, language, and visual signals. In this paper, we present our submission solutions for the Semi-Supervised Learning Sub-Challenge (MER2024-SEMI), which tackles the issue of limited annotated data in emotion recognition. Firstly, to address the class imbalance, we adopt an oversampling strategy. Secondly, we propose a modality representation combinatorial contrastive learning (MR-CCL) framework on the trimodal input data to establish robust initial models. Thirdly, we explore a self-training approach to expand the training set. Finally, we enhance prediction robustness through a multi-classifier weighted soft voting strategy. Our proposed method is validated to be effective on the MER2024-SEMI Challenge, achieving a weighted average F-score of 88.25% and ranking 6th on the leaderboard. Our project is available at https://github.com/WooyoohL/MER2024-SEMI.
Related papers
- Early Joint Learning of Emotion Information Makes MultiModal Model Understand You Better [9.378013909890374]
We present our solutions for emotion recognition in the sub-challenges of Multimodal Emotion Recognition Challenge (MER2024)
To mitigate the modal competition issue between audio and text, we adopt an early fusion strategy.
Our model ranks textbf2nd in both MER2024-SEMI and MER2024-NOISE, validating our method's effectiveness.
arXiv Detail & Related papers (2024-09-12T05:05:34Z) - Multimodal Emotion Recognition with Vision-language Prompting and Modality Dropout [5.721743498917423]
We introduce EmoVCLIP, a model fine-tuned based on CLIP.
We employ modality dropout for robust information fusion.
Lastly, we utilize a self-training strategy to leverage unlabeled videos.
arXiv Detail & Related papers (2024-09-11T08:06:47Z) - Audio-Guided Fusion Techniques for Multimodal Emotion Analysis [2.7013910991626213]
We propose a solution for the semi-supervised learning track (MER-SEMI) in MER2024.
We fine-tuned video and text feature extractors, specifically CLIP-vit-large and Baichuan-13B, using labeled data.
We also propose an Audio-Guided Transformer (AGT) fusion mechanism, showing superior effectiveness in fusing both inter-channel and intra-channel information.
arXiv Detail & Related papers (2024-09-08T07:28:27Z) - SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition [65.19303535139453]
We present our winning approach for the MER-NOISE and MER-OV tracks of the MER2024 Challenge on multimodal emotion recognition.
Our system leverages the advanced emotional understanding capabilities of Emotion-LLaMA to generate high-quality annotations for unlabeled samples.
For the MER-OV track, our utilization of Emotion-LLaMA for open-vocabulary annotation yields an 8.52% improvement in average accuracy and recall compared to GPT-4V.
arXiv Detail & Related papers (2024-08-20T02:46:03Z) - MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition [102.76954967225231]
We organize the MER series of competitions to promote the development of this field.
Last year, we launched MER2023, focusing on three interesting topics: multi-label learning, noise robustness, and semi-supervised learning.
This year, besides expanding the dataset size, we introduce a new track around open-vocabulary emotion recognition.
arXiv Detail & Related papers (2024-04-26T02:05:20Z) - Hierarchical Audio-Visual Information Fusion with Multi-label Joint
Decoding for MER 2023 [51.95161901441527]
In this paper, we propose a novel framework for recognizing both discrete and dimensional emotions.
Deep features extracted from foundation models are used as robust acoustic and visual representations of raw video.
Our final system achieves state-of-the-art performance and ranks third on the leaderboard on MER-MULTI sub-challenge.
arXiv Detail & Related papers (2023-09-11T03:19:10Z) - MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised
Learning [90.17500229142755]
The first Multimodal Emotion Recognition Challenge (MER 2023) was successfully held at ACM Multimedia.
This paper introduces the motivation behind this challenge, describe the benchmark dataset, and provide some statistics about participants.
We believe this high-quality dataset can become a new benchmark in multimodal emotion recognition, especially for the Chinese research community.
arXiv Detail & Related papers (2023-04-18T13:23:42Z) - Multimodal Emotion Recognition with Modality-Pairwise Unsupervised
Contrastive Loss [80.79641247882012]
We focus on unsupervised feature learning for Multimodal Emotion Recognition (MER)
We consider discrete emotions, and as modalities text, audio and vision are used.
Our method, as being based on contrastive loss between pairwise modalities, is the first attempt in MER literature.
arXiv Detail & Related papers (2022-07-23T10:11:24Z) - MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal
Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition.
Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction.
Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.