MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition
- URL: http://arxiv.org/abs/2404.17113v4
- Date: Thu, 18 Jul 2024 11:23:25 GMT
- Title: MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition
- Authors: Zheng Lian, Haiyang Sun, Licai Sun, Zhuofan Wen, Siyuan Zhang, Shun Chen, Hao Gu, Jinming Zhao, Ziyang Ma, Xie Chen, Jiangyan Yi, Rui Liu, Kele Xu, Bin Liu, Erik Cambria, Guoying Zhao, Björn W. Schuller, Jianhua Tao,
- Abstract summary: We organize the MER series of competitions to promote the development of this field.
Last year, we launched MER2023, focusing on three interesting topics: multi-label learning, noise robustness, and semi-supervised learning.
This year, besides expanding the dataset size, we introduce a new track around open-vocabulary emotion recognition.
- Score: 102.76954967225231
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal emotion recognition is an important research topic in artificial intelligence. Over the past few decades, researchers have made remarkable progress by increasing the dataset size and building more effective algorithms. However, due to problems such as complex environments and inaccurate annotations, current systems are hard to meet the demands of practical applications. Therefore, we organize the MER series of competitions to promote the development of this field. Last year, we launched MER2023, focusing on three interesting topics: multi-label learning, noise robustness, and semi-supervised learning. In this year's MER2024, besides expanding the dataset size, we further introduce a new track around open-vocabulary emotion recognition. The main purpose of this track is that existing datasets usually fix the label space and use majority voting to enhance the annotator consistency. However, this process may lead to inaccurate annotations, such as ignoring non-majority or non-candidate labels. In this track, we encourage participants to generate any number of labels in any category, aiming to describe emotional states as accurately as possible. Our baseline code relies on MERTools and is available at: https://github.com/zeroQiaoba/MERTools/tree/master/MER2024.
Related papers
- MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception [24.67682960590225]
We introduce a comprehensive dataset named MCD, featuring a wide range of sensing modalities, high-accuracy ground truth, and diverse challenging environments.
MCD comprises both CCS (Classical Cylindrical Spinning) and NRE (Non-Repetitive Epicyclic) lidars, high-quality IMUs (Inertial Measurement Units), cameras, and UWB (Ultra-WideBand) sensors.
In a pioneering effort, we introduce semantic annotations of 29 classes over 59k sparse NRE lidar scans across three domains, thus providing a novel challenge to existing semantic segmentation research.
arXiv Detail & Related papers (2024-03-18T06:00:38Z) - MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised
Learning [90.17500229142755]
The first Multimodal Emotion Recognition Challenge (MER 2023) was successfully held at ACM Multimedia.
This paper introduces the motivation behind this challenge, describe the benchmark dataset, and provide some statistics about participants.
We believe this high-quality dataset can become a new benchmark in multimodal emotion recognition, especially for the Chinese research community.
arXiv Detail & Related papers (2023-04-18T13:23:42Z) - Exploring Structured Semantic Prior for Multi Label Recognition with
Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging.
Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations.
We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z) - Multimodal Emotion Recognition with Modality-Pairwise Unsupervised
Contrastive Loss [80.79641247882012]
We focus on unsupervised feature learning for Multimodal Emotion Recognition (MER)
We consider discrete emotions, and as modalities text, audio and vision are used.
Our method, as being based on contrastive loss between pairwise modalities, is the first attempt in MER literature.
arXiv Detail & Related papers (2022-07-23T10:11:24Z) - Life is not black and white -- Combining Semi-Supervised Learning with
fuzzy labels [2.550900579709111]
Semi-Supervised Learning can potentially solve this issue by using additional unlabeled data.
We envision the incorporation of fuzzy labels into Semi-Supervised Learning and give a proof-of-concept of the potential lower costs and higher consistency.
arXiv Detail & Related papers (2021-10-13T09:20:41Z) - Robust Long-Tailed Learning under Label Noise [50.00837134041317]
This work investigates the label noise problem under long-tailed label distribution.
We propose a robust framework,algo, that realizes noise detection for long-tailed learning.
Our framework can naturally leverage semi-supervised learning algorithms to further improve the generalisation.
arXiv Detail & Related papers (2021-08-26T03:45:00Z) - The Emerging Trends of Multi-Label Learning [45.63795570392158]
Exabytes of data are generated daily by humans, leading to the growing need for new efforts in dealing with the grand challenges for multi-label learning brought by big data.
There is a lack of systemic studies that focus explicitly on analyzing the emerging trends and new challenges of multi-label learning in the era of big data.
It is imperative to call for a comprehensive survey to fulfill this mission and delineate future research directions and new applications.
arXiv Detail & Related papers (2020-11-23T03:36:00Z) - TAO: A Large-Scale Benchmark for Tracking Any Object [95.87310116010185]
Tracking Any Object dataset consists of 2,907 high resolution videos, captured in diverse environments, which are half a minute long on average.
We ask annotators to label objects that move at any point in the video, and give names to them post factum.
Our vocabulary is both significantly larger and qualitatively different from existing tracking datasets.
arXiv Detail & Related papers (2020-05-20T21:07:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.