Harnessing High-Level Song Descriptors towards Natural Language-Based Music Recommendation
- URL: http://arxiv.org/abs/2411.05649v2
- Date: Mon, 18 Nov 2024 08:04:14 GMT
- Title: Harnessing High-Level Song Descriptors towards Natural Language-Based Music Recommendation
- Authors: Elena V. Epure, Gabriel Meseguer-Brocal, Darius Afchar, Romain Hennequin,
- Abstract summary: Language Models (LMs) have gained popularity in assisting users to navigate large catalogs.
We assess LMs effectiveness in recommending songs based on user natural language descriptions and items with descriptors like genres, moods, and listening contexts.
Our findings reveal improved performance as LMs are fine-tuned for general language similarity, information retrieval, and mapping longer descriptions to shorter, high-level descriptors in music.
- Score: 10.740852246735004
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recommender systems relying on Language Models (LMs) have gained popularity in assisting users to navigate large catalogs. LMs often exploit item high-level descriptors, i.e. categories or consumption contexts, from training data or user preferences. This has been proven effective in domains like movies or products. However, in the music domain, understanding how effectively LMs utilize song descriptors for natural language-based music recommendation is relatively limited. In this paper, we assess LMs effectiveness in recommending songs based on user natural language descriptions and items with descriptors like genres, moods, and listening contexts. We formulate the recommendation task as a dense retrieval problem and assess LMs as they become increasingly familiar with data pertinent to the task and domain. Our findings reveal improved performance as LMs are fine-tuned for general language similarity, information retrieval, and mapping longer descriptions to shorter, high-level descriptors in music.
Related papers
- Text2Tracks: Prompt-based Music Recommendation via Generative Retrieval [8.439626984193591]
We propose to address the task of prompt-based music recommendation as a generative retrieval task.
We introduce Text2Tracks, a generative retrieval model that learns a mapping from a user's music recommendation prompt to the relevant track IDs directly.
arXiv Detail & Related papers (2025-03-31T15:09:19Z) - TALKPLAY: Multimodal Music Recommendation with Large Language Models [6.830154140450626]
TalkPlay represents music through an expanded token vocabulary that encodes multiple modalities.
The model learns to generate recommendations through next-token prediction on music recommendation conversations.
Our approach eliminates traditional recommendation-dialogue pipeline complexity, enabling end-to-end learning of query-aware music recommendations.
arXiv Detail & Related papers (2025-02-19T13:28:20Z) - Evaluation of pretrained language models on music understanding [0.0]
We demonstrate that Large Language Models (LLM) suffer from 1) prompt sensitivity, 2) inability to model negation, and 3) sensitivity towards the presence of specific words.
We quantified these properties as a triplet-based accuracy, evaluating the ability to model the relative similarity of labels in a hierarchical ontology.
Despite the relatively high accuracy reported, inconsistencies are evident in all six models, suggesting that off-the-shelf LLMs need adaptation to music before use.
arXiv Detail & Related papers (2024-09-17T14:44:49Z) - Language Representations Can be What Recommenders Need: Findings and Potentials [57.90679739598295]
We show that item representations, when linearly mapped from advanced LM representations, yield superior recommendation performance.
This outcome suggests the possible homomorphism between the advanced language representation space and an effective item representation space for recommendation.
Our findings highlight the connection between language modeling and behavior modeling, which can inspire both natural language processing and recommender system communities.
arXiv Detail & Related papers (2024-07-07T17:05:24Z) - MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - Bayesian Preference Elicitation with Language Models [82.58230273253939]
We introduce OPEN, a framework that uses BOED to guide the choice of informative questions and an LM to extract features.
In user studies, we find that OPEN outperforms existing LM- and BOED-based methods for preference elicitation.
arXiv Detail & Related papers (2024-03-08T18:57:52Z) - MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music [21.380568107727207]
We present MuChin, the first open-source music description benchmark in Chinese colloquial language.
MuChin is designed to evaluate the performance of multimodal Large Language Models in understanding and describing music.
All data related to the benchmark, along with the scoring code and detailed appendices, have been open-sourced.
arXiv Detail & Related papers (2024-02-15T10:55:01Z) - Eliciting Human Preferences with Language Models [56.68637202313052]
Language models (LMs) can be directed to perform target tasks by using labeled examples or natural language prompts.
We propose to use *LMs themselves* to guide the task specification process.
We study GATE in three domains: email validation, content recommendation, and moral reasoning.
arXiv Detail & Related papers (2023-10-17T21:11:21Z) - LLM-Rec: Personalized Recommendation via Prompting Large Language Models [62.481065357472964]
Large language models (LLMs) have showcased their ability to harness commonsense knowledge and reasoning.
Recent advances in large language models (LLMs) have showcased their remarkable ability to harness commonsense knowledge and reasoning.
This study introduces a novel approach, coined LLM-Rec, which incorporates four distinct prompting strategies of text enrichment for improving personalized text-based recommendations.
arXiv Detail & Related papers (2023-07-24T18:47:38Z) - Language-Guided Music Recommendation for Video via Prompt Analogies [35.48998901411509]
We propose a method to recommend music for an input video while allowing a user to guide music selection with free-form natural language.
Existing music video datasets provide the needed (video, music) training pairs, but lack text descriptions of the music.
arXiv Detail & Related papers (2023-06-15T17:58:01Z) - Explainability in Music Recommender Systems [69.0506502017444]
We discuss how explainability can be addressed in the context of Music Recommender Systems (MRSs)
MRSs are often quite complex and optimized for recommendation accuracy.
We show how explainability components can be integrated within a MRS and in what form explanations can be provided.
arXiv Detail & Related papers (2022-01-25T18:32:11Z) - Knowledge Infused Policy Gradients with Upper Confidence Bound for
Relational Bandits [14.316482550910587]
Contextual Bandits find important use cases in various real-life scenarios such as online advertising, recommendation systems, healthcare, etc.
Most of the algorithms use flat feature vectors to represent context whereas, in the real world, there is a varying number of objects and relations among them to model in the context.
Adding richer relational context representations also introduces a much larger context space making exploration-exploitation harder.
arXiv Detail & Related papers (2021-06-25T21:54:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.