Related papers: Are We There Yet? A Brief Survey of Music Emotion Prediction Datasets, Models and Outstanding Challenges

Are We There Yet? A Brief Survey of Music Emotion Prediction Datasets, Models and Outstanding Challenges

URL: http://arxiv.org/abs/2406.08809v2
Date: Tue, 22 Oct 2024 12:18:27 GMT
Title: Are We There Yet? A Brief Survey of Music Emotion Prediction Datasets, Models and Outstanding Challenges
Authors: Jaeyong Kang, Dorien Herremans,
Abstract summary: We provide a comprehensive overview of the available music-emotion datasets and discuss evaluation standards as well as competitions in the field. We highlight the challenges that persist in accurately capturing emotion in music, including issues related to dataset quality, annotation consistency, and model generalization. We have complemented our findings with an accompanying GitHub repository.
Score: 9.62904012066486
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Deep learning models for music have advanced drastically in recent years, but how good are machine learning models at capturing emotion, and what challenges are researchers facing? In this paper, we provide a comprehensive overview of the available music-emotion datasets and discuss evaluation standards as well as competitions in the field. We also offer a brief overview of various types of music emotion prediction models that have been built over the years, providing insights into the diverse approaches within the field. Through this examination, we highlight the challenges that persist in accurately capturing emotion in music, including issues related to dataset quality, annotation consistency, and model generalization. Additionally, we explore the impact of different modalities, such as audio, MIDI, and physiological signals, on the effectiveness of emotion prediction models. Recognizing the dynamic nature of this field, we have complemented our findings with an accompanying GitHub repository. This repository contains a comprehensive list of music emotion datasets and recent predictive models.

Related papers

Video Soundtrack Generation by Aligning Emotions and Temporal Boundaries [1.1743167854433303]
EMSYNC is a video-based symbolic music generation model that aligns music with a video's emotional content and temporal boundaries. We introduce boundary offsets, a novel temporal conditioning mechanism that enables the model to anticipate and align musical chords with scene cuts. In subjective listening tests, EMSYNC outperforms state-of-the-art models across all subjective metrics, for music theory-aware participants as well as the general listeners.
arXiv Detail & Related papers (2025-02-14T13:32:59Z)
A Survey of Foundation Models for Music Understanding [60.83532699497597]
This work is one of the early reviews of the intersection of AI techniques and music understanding. We investigated, analyzed, and tested recent large-scale music foundation models in respect of their music comprehension abilities.
arXiv Detail & Related papers (2024-09-15T03:34:14Z)
Foundation Models for Music: A Survey [77.77088584651268]
Foundations models (FMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music.
arXiv Detail & Related papers (2024-08-26T15:13:14Z)
Emotion Manipulation Through Music -- A Deep Learning Interactive Visual Approach [0.0]
We introduce a novel way to manipulate the emotional content of a song using AI tools. Our goal is to achieve the desired emotion while leaving the original melody as intact as possible. This research may contribute to on-demand custom music generation, the automated remixing of existing work, and music playlists tuned for emotional progression.
arXiv Detail & Related papers (2024-06-12T20:12:29Z)
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations. We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music. Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z)
MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music. To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation) Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z)
Exploring the Emotional Landscape of Music: An Analysis of Valence Trends and Genre Variations in Spotify Music Data [0.0]
This paper conducts an intricate analysis of musical emotions and trends using Spotify music data. Employing regression modeling, temporal analysis, mood transitions, and genre investigation, the study uncovers patterns within music-emotion relationships.
arXiv Detail & Related papers (2023-10-29T15:57:31Z)
Emotion4MIDI: a Lyrics-based Emotion-Labeled Symbolic Music Dataset [1.3607388598209322]
We present a new large-scale emotion-labeled symbolic music dataset consisting of 12k MIDI songs. We first trained emotion classification models on the GoEmotions dataset, achieving state-of-the-art results with a model half the size of the baseline. Our dataset covers a wide range of fine-grained emotions, providing a valuable resource to explore the connection between music and emotions.
arXiv Detail & Related papers (2023-07-27T11:24:47Z)
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training [74.32603591331718]
We propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training. Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attain state-of-the-art (SOTA) overall scores.
arXiv Detail & Related papers (2023-05-31T18:27:43Z)
Song Emotion Recognition: a Performance Comparison Between Audio Features and Artificial Neural Networks [0.0]
We study the most common features and models used to tackle this problem, revealing which ones are best suited for recognizing emotion in a cappella songs. In this paper, we studied the most common features and models used in recent publications to tackle this problem, revealing which ones are best suited for recognizing emotion in a cappella songs.
arXiv Detail & Related papers (2022-09-24T16:13:25Z)
Emotional Video to Audio Transformation Using Deep Recurrent Neural Networks and a Neuro-Fuzzy System [8.900866276512364]
Current approaches overlook the video's emotional characteristics in the music generation step. We propose a novel hybrid deep neural network that uses an Adaptive Neuro-Fuzzy Inference System to predict a video's emotion. Our model can effectively generate audio that matches the scene eliciting a similar emotion from the viewer in both datasets.
arXiv Detail & Related papers (2020-04-05T07:18:28Z)
Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with Visual Computing for Improved Music Video Analysis [91.3755431537592]
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective. The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone. The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.
arXiv Detail & Related papers (2020-02-01T17:57:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.