Multi-Modality in Music: Predicting Emotion in Music from High-Level
Audio Features and Lyrics
- URL: http://arxiv.org/abs/2302.13321v1
- Date: Sun, 26 Feb 2023 13:38:42 GMT
- Title: Multi-Modality in Music: Predicting Emotion in Music from High-Level
Audio Features and Lyrics
- Authors: Tibor Krols, Yana Nikolova, Ninell Oldenburg
- Abstract summary: This paper aims to test whether a multi-modal approach for music emotion recognition (MER) performs better than a uni-modal one on high-level song features and lyrics.
We use 11 song features retrieved from the Spotify API, combined lyrics features including sentiment, TF-IDF, and Anew to predict valence and arousal.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper aims to test whether a multi-modal approach for music emotion
recognition (MER) performs better than a uni-modal one on high-level song
features and lyrics. We use 11 song features retrieved from the Spotify API,
combined lyrics features including sentiment, TF-IDF, and Anew to predict
valence and arousal (Russell, 1980) scores on the Deezer Mood Detection Dataset
(DMDD) (Delbouys et al., 2018) with 4 different regression models. We find that
out of the 11 high-level song features, mainly 5 contribute to the performance,
multi-modal features do better than audio alone when predicting valence. We
made our code publically available.
Related papers
- Learning Musical Representations for Music Performance Question Answering [10.912207282129753]
multimodal learning methods are incapable of dealing with fundamental problems within the music performances.
Our primary backbone is designed to incorporate multimodal interactions within the context of music data.
Our experiments show state-of-the-art effects on the Music AVQA datasets.
arXiv Detail & Related papers (2025-02-10T17:41:57Z) - MusicFlow: Cascaded Flow Matching for Text Guided Music Generation [53.63948108922333]
MusicFlow is a cascaded text-to-music generation model based on flow matching.
We leverage masked prediction as the training objective, enabling the model to generalize to other tasks such as music infilling and continuation.
arXiv Detail & Related papers (2024-10-27T15:35:41Z) - SONICS: Synthetic Or Not -- Identifying Counterfeit Songs [0.16777183511743465]
We introduce SONICS, a novel dataset for end-to-end Synthetic Song Detection (SSD)
We highlight the importance of modeling long-range temporal dependencies in songs for effective authenticity detection.
In particular, for long audio samples, our top-performing variant outperforms ViT by 8% F1 score while being 38% faster and using 26% less memory.
arXiv Detail & Related papers (2024-08-26T08:02:57Z) - ChatMusician: Understanding and Generating Music Intrinsically with LLM [81.48629006702409]
ChatMusician is an open-source Large Language Models (LLMs) that integrates intrinsic musical abilities.
It can understand and generate music with a pure text tokenizer without any external multi-modal neural structures or tokenizers.
Our model is capable of composing well-structured, full-length music, conditioned on texts, chords, melodies, motifs, musical forms, etc.
arXiv Detail & Related papers (2024-02-25T17:19:41Z) - MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE.
It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description.
We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z) - Tollywood Emotions: Annotation of Valence-Arousal in Telugu Song Lyrics [0.0]
We present a new manually annotated dataset of Telugu songs' lyrics collected from Spotify.
We create two music emotion recognition models by using two classification techniques.
We make the dataset publicly available with lyrics, annotations and Spotify IDs.
arXiv Detail & Related papers (2023-03-16T14:47:52Z) - An Analysis of Classification Approaches for Hit Song Prediction using
Engineered Metadata Features with Lyrics and Audio Features [5.871032585001082]
This study aims to improve the prediction result of the top 10 hits among Billboard Hot 100 songs using more alternative metadata.
Five machine learning approaches are applied, including: k-nearest neighbours, Naive Bayes, Random Forest, Logistic Regression and Multilayer Perceptron.
Our results show that Random Forest (RF) and Logistic Regression (LR) with all features outperforms other models, achieving 89.1% and 87.2% accuracy, and 0.91 and 0.93 AUC, respectively.
arXiv Detail & Related papers (2023-01-31T09:48:53Z) - A Novel Multi-Task Learning Method for Symbolic Music Emotion
Recognition [76.65908232134203]
Symbolic Music Emotion Recognition(SMER) is to predict music emotion from symbolic data, such as MIDI and MusicXML.
In this paper, we present a simple multi-task framework for SMER, which incorporates the emotion recognition task with other emotion-related auxiliary tasks.
arXiv Detail & Related papers (2022-01-15T07:45:10Z) - Comparison and Analysis of Deep Audio Embeddings for Music Emotion
Recognition [1.6143012623830792]
We use state-of-the-art pre-trained deep audio embedding methods to be used in the Music Emotion Recognition task.
Deep audio embeddings represent musical emotion semantics for the MER task without expert human engineering.
arXiv Detail & Related papers (2021-04-13T21:09:54Z) - PopMAG: Pop Music Accompaniment Generation [190.09996798215738]
We propose a novel MUlti-track MIDI representation (MuMIDI) which enables simultaneous multi-track generation in a single sequence.
MuMIDI enlarges the sequence length and brings the new challenge of long-term music modeling.
We call our system for pop music accompaniment generation as PopMAG.
arXiv Detail & Related papers (2020-08-18T02:28:36Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.