SongSage: A Large Musical Language Model with Lyric Generative Pre-training
- URL: http://arxiv.org/abs/2601.01153v1
- Date: Sat, 03 Jan 2026 10:54:37 GMT
- Title: SongSage: A Large Musical Language Model with Lyric Generative Pre-training
- Authors: Jiani Guo, Jiajia Li, Jie Wu, Zuchao Li, Yujiu Yang, Ping Wang,
- Abstract summary: SongSage is a large musical language model equipped with diverse lyric-centric intelligence through lyric generative pretraining.<n>SongSage exhibits a strong understanding of lyric-centric knowledge, excels in rewriting user queries for zero-shot playlist recommendations, generates and continues lyrics effectively, and performs proficiently across seven additional capabilities.
- Score: 69.52790104805794
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models have achieved significant success in various domains, yet their understanding of lyric-centric knowledge has not been fully explored. In this work, we first introduce PlaylistSense, a dataset to evaluate the playlist understanding capability of language models. PlaylistSense encompasses ten types of user queries derived from common real-world perspectives, challenging LLMs to accurately grasp playlist features and address diverse user intents. Comprehensive evaluations indicate that current general-purpose LLMs still have potential for improvement in playlist understanding. Inspired by this, we introduce SongSage, a large musical language model equipped with diverse lyric-centric intelligence through lyric generative pretraining. SongSage undergoes continual pretraining on LyricBank, a carefully curated corpus of 5.48 billion tokens focused on lyrical content, followed by fine-tuning with LyricBank-SFT, a meticulously crafted instruction set comprising 775k samples across nine core lyric-centric tasks. Experimental results demonstrate that SongSage exhibits a strong understanding of lyric-centric knowledge, excels in rewriting user queries for zero-shot playlist recommendations, generates and continues lyrics effectively, and performs proficiently across seven additional capabilities. Beyond its lyric-centric expertise, SongSage also retains general knowledge comprehension and achieves a competitive MMLU score. We will keep the datasets inaccessible due to copyright restrictions and release the SongSage and training script to ensure reproducibility and support music AI research and applications, the datasets release plan details are provided in the appendix.
Related papers
- Lyrics Matter: Exploiting the Power of Learnt Representations for Music Popularity Prediction [47.3124073459729]
This work addresses the under-explored role of lyrics in predicting popularity.<n>We present an automated pipeline that uses LLM to extract high-dimensional lyric embeddings.<n>These features are integrated into HitMusicLyricNet, a multimodal architecture that combines audio, lyrics, and social metadata for popularity score prediction.
arXiv Detail & Related papers (2025-12-05T08:09:26Z) - From Joy to Fear: A Benchmark of Emotion Estimation in Pop Song Lyrics [40.12543056558646]
The emotional content of song lyrics plays a pivotal role in shaping listener experiences and influencing musical preferences.<n>This paper investigates the task of multi-label emotional attribution of song lyrics by predicting six emotional intensity scores corresponding to six fundamental emotions.
arXiv Detail & Related papers (2025-09-06T06:28:28Z) - Towards Estimating Personal Values in Song Lyrics [5.170818712089796]
Most music widely consumed in Western Countries contains song lyrics, with U.S. samples reporting almost all of their song libraries contain lyrics.
In this project, we take a perspectivist approach, guided by social science theory, to gathering annotations, estimating their quality, and aggregating them.
We then compare aggregated ratings to estimates based on pre-trained sentence/word embedding models by employing a validated value dictionary.
arXiv Detail & Related papers (2024-08-22T19:22:55Z) - Synthetic Lyrics Detection Across Languages and Genres [4.987546582439803]
Large language models (LLMs) to generate music content, particularly lyrics, has gained in popularity.<n>Previous research has explored content detection in various domains, but no work has focused on the text modality, lyrics, in music.<n>We curated a diverse dataset of real and synthetic lyrics from multiple languages, music genres, and artists.<n>We performed a thorough evaluation of existing synthetic text detection approaches on lyrics, a previously unexplored data type.<n>Following both music and industrial constraints, we examined how well these approaches generalize across languages, scale with data availability, handle multilingual language content, and perform on novel genres in few-shot settings
arXiv Detail & Related papers (2024-06-21T15:19:21Z) - SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition [82.38021790213752]
SongComposer is a music-specialized large language model (LLM)<n>It integrates the capability of simultaneously composing melodies into LLMs by leveraging three key innovations.<n>It outperforms advanced LLMs in tasks such as lyric-to-melody generation, melody-to-lyric generation, song continuation, and text-to-song creation.<n>We will release SongCompose, a large-scale dataset for training, containing paired lyrics and melodies in Chinese and English.
arXiv Detail & Related papers (2024-02-27T16:15:28Z) - LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT [48.28624219567131]
We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method.
We use Whisper, a weakly supervised robust speech recognition model, and GPT-4, today's most performant chat-based large language model.
Our experiments show that LyricWhiz significantly reduces Word Error Rate compared to existing methods in English.
arXiv Detail & Related papers (2023-06-29T17:01:51Z) - Unsupervised Melody-to-Lyric Generation [91.29447272400826]
We propose a method for generating high-quality lyrics without training on any aligned melody-lyric data.
We leverage the segmentation and rhythm alignment between melody and lyrics to compile the given melody into decoding constraints.
Our model can generate high-quality lyrics that are more on-topic, singable, intelligible, and coherent than strong baselines.
arXiv Detail & Related papers (2023-05-30T17:20:25Z) - Interpreting Song Lyrics with an Audio-Informed Pre-trained Language
Model [12.19432397758502]
BART-fusion is a novel model for generating lyric interpretations from lyrics and music audio.
We employ a cross-modal attention module to incorporate the audio representation into the lyrics representation.
We show that the additional audio information helps our model to understand words and music better.
arXiv Detail & Related papers (2022-08-24T17:07:37Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.