A Novel Multi-Task Learning Method for Symbolic Music Emotion
Recognition
- URL: http://arxiv.org/abs/2201.05782v1
- Date: Sat, 15 Jan 2022 07:45:10 GMT
- Title: A Novel Multi-Task Learning Method for Symbolic Music Emotion
Recognition
- Authors: Jibao Qiu and C. L. Philip Chen and Tong Zhang
- Abstract summary: Symbolic Music Emotion Recognition(SMER) is to predict music emotion from symbolic data, such as MIDI and MusicXML.
In this paper, we present a simple multi-task framework for SMER, which incorporates the emotion recognition task with other emotion-related auxiliary tasks.
- Score: 76.65908232134203
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Symbolic Music Emotion Recognition(SMER) is to predict music emotion from
symbolic data, such as MIDI and MusicXML. Previous work mainly focused on
learning better representation via (mask) language model pre-training but
ignored the intrinsic structure of the music, which is extremely important to
the emotional expression of music. In this paper, we present a simple
multi-task framework for SMER, which incorporates the emotion recognition task
with other emotion-related auxiliary tasks derived from the intrinsic structure
of the music. The results show that our multi-task framework can be adapted to
different models. Moreover, the labels of auxiliary tasks are easy to be
obtained, which means our multi-task methods do not require manually annotated
labels other than emotion. Conducting on two publicly available datasets
(EMOPIA and VGMIDI), the experiments show that our methods perform better in
SMER task. Specifically, accuracy has been increased by 4.17 absolute point to
67.58 in EMOPIA dataset, and 1.97 absolute point to 55.85 in VGMIDI dataset.
Ablation studies also show the effectiveness of multi-task methods designed in
this paper.
Related papers
- Bridging Paintings and Music -- Exploring Emotion based Music Generation through Paintings [10.302353984541497]
This research develops a model capable of generating music that resonates with the emotions depicted in visual arts.
Addressing the scarcity of aligned art and music data, we curated the Emotion Painting Music dataset.
Our dual-stage framework converts images to text descriptions of emotional content and then transforms these descriptions into music, facilitating efficient learning with minimal data.
arXiv Detail & Related papers (2024-09-12T08:19:25Z) - LastResort at SemEval-2024 Task 3: Exploring Multimodal Emotion Cause Pair Extraction as Sequence Labelling Task [3.489826905722736]
SemEval 2024 introduces the task of Multimodal Emotion Cause Analysis in Conversations.
This paper proposes models that tackle this task as an utterance labeling and a sequence labeling problem.
In the official leaderboard for the task, our architecture was ranked 8th, achieving an F1-score of 0.1759 on the leaderboard.
arXiv Detail & Related papers (2024-04-02T16:32:49Z) - Impact of time and note duration tokenizations on deep learning symbolic
music modeling [0.0]
We analyze the common tokenization methods and experiment with time and note duration representations.
We demonstrate that explicit information leads to better results depending on the task.
arXiv Detail & Related papers (2023-10-12T16:56:37Z) - Contrastive Learning with Positive-Negative Frame Mask for Music
Representation [91.44187939465948]
This paper proposes a novel Positive-nEgative frame mask for Music Representation based on the contrastive learning framework, abbreviated as PEMR.
We devise a novel contrastive learning objective to accommodate both self-augmented positives/negatives sampled from the same music.
arXiv Detail & Related papers (2022-03-17T07:11:42Z) - Multi-task Learning with Metadata for Music Mood Classification [0.0]
Mood recognition is an important problem in music informatics and has key applications in music discovery and recommendation.
We propose a multi-task learning approach in which a shared model is simultaneously trained for mood and metadata prediction tasks.
Applying our technique on the existing state-of-the-art convolutional neural networks for mood classification improves their performances consistently.
arXiv Detail & Related papers (2021-10-10T11:36:34Z) - MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training [97.91071692716406]
Symbolic music understanding refers to the understanding of music from the symbolic data.
MusicBERT is a large-scale pre-trained model for music understanding.
arXiv Detail & Related papers (2021-06-10T10:13:05Z) - Comparison and Analysis of Deep Audio Embeddings for Music Emotion
Recognition [1.6143012623830792]
We use state-of-the-art pre-trained deep audio embedding methods to be used in the Music Emotion Recognition task.
Deep audio embeddings represent musical emotion semantics for the MER task without expert human engineering.
arXiv Detail & Related papers (2021-04-13T21:09:54Z) - Emotion-Based End-to-End Matching Between Image and Music in
Valence-Arousal Space [80.49156615923106]
Matching images and music with similar emotions might help to make emotion perceptions more vivid and stronger.
Existing emotion-based image and music matching methods either employ limited categorical emotion states or train the matching model using an impractical multi-stage pipeline.
In this paper, we study end-to-end matching between image and music based on emotions in the continuous valence-arousal (VA) space.
arXiv Detail & Related papers (2020-08-22T20:12:23Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z) - Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with
Visual Computing for Improved Music Video Analysis [91.3755431537592]
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective.
The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone.
The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.
arXiv Detail & Related papers (2020-02-01T17:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.