Related papers: WeMusic-Agent: Efficient Conversational Music Recommendation via Knowledge Internalization and Agentic Boundary Learning

WeMusic-Agent: Efficient Conversational Music Recommendation via Knowledge Internalization and Agentic Boundary Learning

URL: http://arxiv.org/abs/2512.16108v1
Date: Thu, 18 Dec 2025 02:59:19 GMT
Title: WeMusic-Agent: Efficient Conversational Music Recommendation via Knowledge Internalization and Agentic Boundary Learning
Authors: Wendong Bi, Yirong Mao, Xianglong Liu, Kai Tian, Jian Zhang, Hanjie Wang, Wenhui Que,
Abstract summary: This paper proposes WeMusic-Agent, a training framework for efficient conversational music recommendation.<n>We present WeMusic-Agent-M1, an agentic model that internalizes extensive musical knowledge via continued pretraining on 50B music-related corpus.<n>We also construct a benchmark for personalized music recommendations derived from real-world data in WeChat Listen.
Score: 12.737364415781805
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Personalized music recommendation in conversational scenarios usually requires a deep understanding of user preferences and nuanced musical context, yet existing methods often struggle with balancing specialized domain knowledge and flexible tool integration. This paper proposes WeMusic-Agent, a training framework for efficient LLM-based conversational music recommendation. By integrating the knowledge internalization and agentic boundary learning, the framework aims to teach the model to intelligently decide when to leverage internalized knowledge and when to call specialized tools (e.g., music retrieval APIs, music recommendation systems). Under this framework, we present WeMusic-Agent-M1, an agentic model that internalizes extensive musical knowledge via continued pretraining on 50B music-related corpus while acquiring the ability to invoke external tools when necessary. Additionally, considering the lack of open-source benchmarks for conversational music recommendation, we also construct a benchmark for personalized music recommendations derived from real-world data in WeChat Listen. This benchmark enables comprehensive evaluation across multiple dimensions, including relevance, personalization, and diversity of the recommendations. Experiments on real-world data demonstrate that WeMusic-Agent achieves significant improvements over existing models.

Related papers

BASS: Benchmarking Audio LMs for Musical Structure and Semantic Reasoning [74.84822135705025]
We introduce BASS, designed to evaluate music understanding and reasoning in audio language models.<n>BASS comprises 2658 questions spanning 12 tasks, unique 1993 songs and covering over 138 hours of music.<n>We evaluate 14 open-source and frontier multimodal LMs, finding that even state-of-the-art models struggle on higher-level reasoning tasks.
arXiv Detail & Related papers (2026-02-03T23:40:31Z)
MuCPT: Music-related Natural Language Model Continued Pretraining [2.2288022262475873]
We build a large, music-related natural language corpus (40B tokens) that combines open source and in-house data.<n>We also introduce reference-model (RM)-based token-level soft scoring for quality control.<n>Overall, this work advances both the right corpus and the right objective, offering a scalable data-training framework.
arXiv Detail & Related papers (2025-11-18T08:33:34Z)
TalkPlay-Tools: Conversational Music Recommendation with LLM Tool Calling [20.889365999166813]
We propose a music recommendation system with tool calling to serve as a unified retrieval-reranking pipeline.<n>Our system positions an LLM as an end-to-end recommendation system that interprets user intent.<n>We demonstrate that this unified tool-calling framework achieves competitive performance across diverse recommendation scenarios.
arXiv Detail & Related papers (2025-10-02T06:08:54Z)
Discovering "Words" in Music: Unsupervised Learning of Compositional Sparse Code for Symbolic Music [50.87225308217594]
This paper presents an unsupervised machine learning algorithm that identifies recurring patterns -- referred to as music-words'' -- from symbolic music data.<n>We formulate the task of music-word discovery as a statistical optimization problem and propose a two-stage Expectation-Maximization (EM)-based learning framework.
arXiv Detail & Related papers (2025-09-29T11:10:57Z)
TALKPLAY: Multimodal Music Recommendation with Large Language Models [6.830154140450626]
We present TALKPLAY, a novel multimodal music recommendation system that reformulates recommendation as a token generation problem using large language models (LLMs)<n>Our system effectively recommends music from diverse user queries while generating contextually relevant responses.<n>Our qualitative and quantitative evaluation demonstrates that TALKPLAY significantly outperforms unimodal approaches based solely on text or listening history in both recommendation performance and conversational naturalness.
arXiv Detail & Related papers (2025-02-19T13:28:20Z)
SoundSignature: What Type of Music Do You Like? [0.0]
SoundSignature is a music application that integrates a custom OpenAI Assistant to analyze users' favorite songs. The system incorporates state-of-the-art Music Information Retrieval (MIR) Python packages to combine extracted acoustic/musical features with the assistant's extensive knowledge of the artists and bands.
arXiv Detail & Related papers (2024-10-04T12:40:45Z)
Towards Leveraging Contrastively Pretrained Neural Audio Embeddings for Recommender Tasks [18.95453617434051]
Music recommender systems frequently utilize network-based models to capture relationships between music pieces, artists, and users. New music pieces or artists often face the cold-start problem due to insufficient initial information. To address this, one can extract content-based information directly from the music to enhance collaborative-filtering-based methods.
arXiv Detail & Related papers (2024-09-13T17:53:06Z)
Foundation Models for Music: A Survey [77.77088584651268]
Foundations models (FMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music.
arXiv Detail & Related papers (2024-08-26T15:13:14Z)
MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music [21.380568107727207]
We present MuChin, the first open-source music description benchmark in Chinese colloquial language. MuChin is designed to evaluate the performance of multimodal Large Language Models in understanding and describing music. All data related to the benchmark, along with the scoring code and detailed appendices, have been open-sourced.
arXiv Detail & Related papers (2024-02-15T10:55:01Z)
MusicRL: Aligning Music Generation to Human Preferences [62.44903326718772]
MusicRL is the first music generation system finetuned from human feedback. We deploy MusicLM to users and collect a substantial dataset comprising 300,000 pairwise preferences. We train MusicRL-U, the first text-to-music model that incorporates human feedback at scale.
arXiv Detail & Related papers (2024-02-06T18:36:52Z)
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models [54.55063772090821]
MusicAgent integrates numerous music-related tools and an autonomous workflow to address user requirements. The primary goal of this system is to free users from the intricacies of AI-music tools, enabling them to concentrate on the creative aspect.
arXiv Detail & Related papers (2023-10-18T13:31:10Z)
MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE. It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description. We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.