Towards Unified Music Emotion Recognition across Dimensional and Categorical Models
- URL: http://arxiv.org/abs/2502.03979v2
- Date: Fri, 11 Apr 2025 12:58:23 GMT
- Title: Towards Unified Music Emotion Recognition across Dimensional and Categorical Models
- Authors: Jaeyong Kang, Dorien Herremans,
- Abstract summary: One of the most significant challenges in Music Emotion Recognition (MER) comes from the fact that emotion labels can be heterogeneous across datasets.<n>We present a unified multitask learning framework that combines categorical and dimensional labels.<n>Our work makes a significant contribution to MER by allowing the combination of categorical and dimensional emotion labels in one unified framework.
- Score: 9.62904012066486
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: One of the most significant challenges in Music Emotion Recognition (MER) comes from the fact that emotion labels can be heterogeneous across datasets with regard to the emotion representation, including categorical (e.g., happy, sad) versus dimensional labels (e.g., valence-arousal). In this paper, we present a unified multitask learning framework that combines these two types of labels and is thus able to be trained on multiple datasets. This framework uses an effective input representation that combines musical features (i.e., key and chords) and MERT embeddings. Moreover, knowledge distillation is employed to transfer the knowledge of teacher models trained on individual datasets to a student model, enhancing its ability to generalize across multiple tasks. To validate our proposed framework, we conducted extensive experiments on a variety of datasets, including MTG-Jamendo, DEAM, PMEmo, and EmoMusic. According to our experimental results, the inclusion of musical features, multitask learning, and knowledge distillation significantly enhances performance. In particular, our model outperforms the state-of-the-art models, including the best-performing model from the MediaEval 2021 competition on the MTG-Jamendo dataset. Our work makes a significant contribution to MER by allowing the combination of categorical and dimensional emotion labels in one unified framework, thus enabling training across datasets.
Related papers
- Multi-Label Contrastive Learning : A Comprehensive Study [48.81069245141415]
Multi-label classification has emerged as a key area in both research and industry.<n>Applying contrastive learning to multi-label classification presents unique challenges.<n>We conduct an in-depth study of contrastive learning loss for multi-label classification across diverse settings.
arXiv Detail & Related papers (2024-11-27T20:20:06Z) - Interactive DualChecker for Mitigating Hallucinations in Distilling Large Language Models [7.632217365130212]
Large Language Models (LLMs) have demonstrated exceptional capabilities across various machine learning (ML) tasks.
These models can produce hallucinations, particularly in domains with incomplete knowledge.
We introduce DualChecker, an innovative framework designed to mitigate hallucinations and improve the performance of both teacher and student models.
arXiv Detail & Related papers (2024-08-22T12:04:04Z) - A Multi-Task, Multi-Modal Approach for Predicting Categorical and
Dimensional Emotions [0.0]
We propose a multi-task, multi-modal system that predicts categorical and dimensional emotions.
Results emphasise the importance of cross-regularisation between the two types of emotions.
arXiv Detail & Related papers (2023-12-31T16:48:03Z) - CARAT: Contrastive Feature Reconstruction and Aggregation for
Multi-Modal Multi-Label Emotion Recognition [18.75994345925282]
Multi-modal multi-label emotion recognition (MMER) aims to identify relevant emotions from multiple modalities.
The challenge of MMER is how to effectively capture discriminative features for multiple labels from heterogeneous data.
This paper presents ContrAstive feature Reconstruction and AggregaTion (CARAT) for the MMER task.
arXiv Detail & Related papers (2023-12-15T20:58:05Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - Knowledge Graph Completion Models are Few-shot Learners: An Empirical
Study of Relation Labeling in E-commerce with LLMs [16.700089674927348]
Large Language Models (LLMs) have shown surprising results in numerous natural language processing tasks.
This paper investigates their powerful learning capabilities in natural language and effectiveness in predicting relations between product types with limited labeled data.
Our results show that LLMs significantly outperform existing KG completion models in relation labeling for e-commerce KGs and exhibit performance strong enough to replace human labeling.
arXiv Detail & Related papers (2023-05-17T00:08:36Z) - Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product
Retrieval [152.3504607706575]
This research aims to conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.
We first contribute the Product1M datasets, and define two real practical instance-level retrieval tasks.
We exploit to train a more effective cross-modal model which is adaptively capable of incorporating key concept information from the multi-modal data.
arXiv Detail & Related papers (2022-06-17T15:40:45Z) - A Novel Multi-Task Learning Method for Symbolic Music Emotion
Recognition [76.65908232134203]
Symbolic Music Emotion Recognition(SMER) is to predict music emotion from symbolic data, such as MIDI and MusicXML.
In this paper, we present a simple multi-task framework for SMER, which incorporates the emotion recognition task with other emotion-related auxiliary tasks.
arXiv Detail & Related papers (2022-01-15T07:45:10Z) - MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal
Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition.
Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction.
Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z) - Multi-task Learning with Metadata for Music Mood Classification [0.0]
Mood recognition is an important problem in music informatics and has key applications in music discovery and recommendation.
We propose a multi-task learning approach in which a shared model is simultaneously trained for mood and metadata prediction tasks.
Applying our technique on the existing state-of-the-art convolutional neural networks for mood classification improves their performances consistently.
arXiv Detail & Related papers (2021-10-10T11:36:34Z) - Dual-Teacher: Integrating Intra-domain and Inter-domain Teachers for
Annotation-efficient Cardiac Segmentation [65.81546955181781]
We propose a novel semi-supervised domain adaptation approach, namely Dual-Teacher.
The student model learns the knowledge of unlabeled target data and labeled source data by two teacher models.
We demonstrate that our approach is able to concurrently utilize unlabeled data and cross-modality data with superior performance.
arXiv Detail & Related papers (2020-07-13T10:00:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.