From Joy to Fear: A Benchmark of Emotion Estimation in Pop Song Lyrics
- URL: http://arxiv.org/abs/2509.05617v1
- Date: Sat, 06 Sep 2025 06:28:28 GMT
- Title: From Joy to Fear: A Benchmark of Emotion Estimation in Pop Song Lyrics
- Authors: Shay Dahary, Avi Edana, Alexander Apartsin, Yehudit Aperstein,
- Abstract summary: The emotional content of song lyrics plays a pivotal role in shaping listener experiences and influencing musical preferences.<n>This paper investigates the task of multi-label emotional attribution of song lyrics by predicting six emotional intensity scores corresponding to six fundamental emotions.
- Score: 40.12543056558646
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The emotional content of song lyrics plays a pivotal role in shaping listener experiences and influencing musical preferences. This paper investigates the task of multi-label emotional attribution of song lyrics by predicting six emotional intensity scores corresponding to six fundamental emotions. A manually labeled dataset is constructed using a mean opinion score (MOS) approach, which aggregates annotations from multiple human raters to ensure reliable ground-truth labels. Leveraging this dataset, we conduct a comprehensive evaluation of several publicly available large language models (LLMs) under zero-shot scenarios. Additionally, we fine-tune a BERT-based model specifically for predicting multi-label emotion scores. Experimental results reveal the relative strengths and limitations of zero-shot and fine-tuned models in capturing the nuanced emotional content of lyrics. Our findings highlight the potential of LLMs for emotion recognition in creative texts, providing insights into model selection strategies for emotion-based music information retrieval applications. The labeled dataset is available at https://github.com/LLM-HITCS25S/LyricsEmotionAttribution.
Related papers
- SongSage: A Large Musical Language Model with Lyric Generative Pre-training [69.52790104805794]
SongSage is a large musical language model equipped with diverse lyric-centric intelligence through lyric generative pretraining.<n>SongSage exhibits a strong understanding of lyric-centric knowledge, excels in rewriting user queries for zero-shot playlist recommendations, generates and continues lyrics effectively, and performs proficiently across seven additional capabilities.
arXiv Detail & Related papers (2026-01-03T10:54:37Z) - Story2MIDI: Emotionally Aligned Music Generation from Text [38.36870481571071]
We introduce Story2MIDI, a sequence-to-sequence Transformer-based model for generating emotion-aligned music from a given piece of text.<n>Our results indicate that our model effectively learns emotion-relevant features in music and incorporates them into its generation process.
arXiv Detail & Related papers (2025-12-01T20:35:18Z) - Switchboard-Affect: Emotion Perception Labels from Conversational Speech [7.576840738395629]
We identify the Switchboard corpus as a promising source of naturalistic conversational speech.<n>We train a crowd to label the dataset for categorical emotions and dimensional attributes.<n>We evaluate state-of-the-art SER models and find variable performance across the emotion categories with especially poor generalization.
arXiv Detail & Related papers (2025-10-14T21:23:04Z) - Emo Pillars: Knowledge Distillation to Support Fine-Grained Context-Aware and Context-Less Emotion Classification [56.974545305472304]
Most datasets for sentiment analysis lack context in which an opinion was expressed, often crucial for emotion understanding, and are mainly limited by a few emotion categories.<n>We design an LLM-based data synthesis pipeline and leverage a large model, Mistral-7b, for the generation of training examples for more accessible, lightweight BERT-type encoder models.<n>We show that Emo Pillars models are highly adaptive to new domains when tuned to specific tasks such as GoEmotions, ISEAR, IEMOCAP, and EmoContext, reaching the SOTA performance on the first three.
arXiv Detail & Related papers (2025-04-23T16:23:17Z) - Enriching Multimodal Sentiment Analysis through Textual Emotional Descriptions of Visual-Audio Content [56.62027582702816]
Multimodal Sentiment Analysis seeks to unravel human emotions by amalgamating text, audio, and visual data.<n>Yet, discerning subtle emotional nuances within audio and video expressions poses a formidable challenge.<n>We introduce DEVA, a progressive fusion framework founded on textual sentiment descriptions.
arXiv Detail & Related papers (2024-12-12T11:30:41Z) - Semi-Supervised Self-Learning Enhanced Music Emotion Recognition [6.315220462630698]
Music emotion recognition (MER) aims to identify the emotions conveyed in a given musical piece.<n>Currently, the available public datasets have limited sample sizes.<n>We propose a semi-supervised self-learning (SSSL) method, which can differentiate between samples with correct and incorrect labels in a self-learning manner.
arXiv Detail & Related papers (2024-10-29T09:42:07Z) - Exploring and Applying Audio-Based Sentiment Analysis in Music [0.0]
The ability of a computational model to interpret musical emotions is largely unexplored.
This study seeks to (1) predict the emotion of a musical clip over time and (2) determine the next emotion value after the music in a time series to ensure seamless transitions.
arXiv Detail & Related papers (2024-02-22T22:34:06Z) - Modelling Emotion Dynamics in Song Lyrics with State Space Models [4.18804572788063]
We propose a method to predict emotion dynamics in song lyrics without song-level supervision.
Our experiments show that applying our method consistently improves the performance of sentence-level baselines without requiring any annotated songs.
arXiv Detail & Related papers (2022-10-17T21:07:23Z) - Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on
Data-Driven Deep Learning [70.30713251031052]
We propose a data-driven deep learning model, i.e. StrengthNet, to improve the generalization of emotion strength assessment for seen and unseen speech.
Experiments show that the predicted emotion strength of the proposed StrengthNet is highly correlated with ground truth scores for both seen and unseen speech.
arXiv Detail & Related papers (2022-06-15T01:25:32Z) - EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional
Text-to-Speech Model [56.75775793011719]
We introduce and publicly release a Mandarin emotion speech dataset including 9,724 samples with audio files and its emotion human-labeled annotation.
Unlike those models which need additional reference audio as input, our model could predict emotion labels just from the input text and generate more expressive speech conditioned on the emotion embedding.
In the experiment phase, we first validate the effectiveness of our dataset by an emotion classification task. Then we train our model on the proposed dataset and conduct a series of subjective evaluations.
arXiv Detail & Related papers (2021-06-17T08:34:21Z) - Affect2MM: Affective Analysis of Multimedia Content Using Emotion
Causality [84.69595956853908]
We present Affect2MM, a learning method for time-series emotion prediction for multimedia content.
Our goal is to automatically capture the varying emotions depicted by characters in real-life human-centric situations and behaviors.
arXiv Detail & Related papers (2021-03-11T09:07:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.