MuseChat: A Conversational Music Recommendation System for Videos
- URL: http://arxiv.org/abs/2310.06282v4
- Date: Sat, 9 Mar 2024 18:28:34 GMT
- Title: MuseChat: A Conversational Music Recommendation System for Videos
- Authors: Zhikang Dong, Bin Chen, Xiulong Liu, Pawel Polak, Peng Zhang
- Abstract summary: MuseChat is a first-of-its-kind dialogue-based recommendation system that personalizes music suggestions for videos.
Our system consists of two key functionalities with associated modules: recommendation and reasoning.
Experiment results show that MuseChat achieves significant improvements over existing video-based music retrieval methods.
- Score: 12.47508840909336
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Music recommendation for videos attracts growing interest in multi-modal
research. However, existing systems focus primarily on content compatibility,
often ignoring the users' preferences. Their inability to interact with users
for further refinements or to provide explanations leads to a less satisfying
experience. We address these issues with MuseChat, a first-of-its-kind
dialogue-based recommendation system that personalizes music suggestions for
videos. Our system consists of two key functionalities with associated modules:
recommendation and reasoning. The recommendation module takes a video along
with optional information including previous suggested music and user's
preference as inputs and retrieves an appropriate music matching the context.
The reasoning module, equipped with the power of Large Language Model
(Vicuna-7B) and extended to multi-modal inputs, is able to provide reasonable
explanation for the recommended music. To evaluate the effectiveness of
MuseChat, we build a large-scale dataset, conversational music recommendation
for videos, that simulates a two-turn interaction between a user and a
recommender based on accurate music track information. Experiment results show
that MuseChat achieves significant improvements over existing video-based music
retrieval methods as well as offers strong interpretability and
interactability.
Related papers
- TALKPLAY: Multimodal Music Recommendation with Large Language Models [6.830154140450626]
TalkPlay represents music through an expanded token vocabulary that encodes multiple modalities.
The model learns to generate recommendations through next-token prediction on music recommendation conversations.
Our approach eliminates traditional recommendation-dialogue pipeline complexity, enabling end-to-end learning of query-aware music recommendations.
arXiv Detail & Related papers (2025-02-19T13:28:20Z) - SoundSignature: What Type of Music Do You Like? [0.0]
SoundSignature is a music application that integrates a custom OpenAI Assistant to analyze users' favorite songs.
The system incorporates state-of-the-art Music Information Retrieval (MIR) Python packages to combine extracted acoustic/musical features with the assistant's extensive knowledge of the artists and bands.
arXiv Detail & Related papers (2024-10-04T12:40:45Z) - VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling [71.01050359126141]
We propose VidMuse, a framework for generating music aligned with video inputs.
VidMuse produces high-fidelity music that is both acoustically and semantically aligned with the video.
arXiv Detail & Related papers (2024-06-06T17:58:11Z) - Intelligent Director: An Automatic Framework for Dynamic Visual
Composition using ChatGPT [47.40350722537004]
We propose the Dynamic Visual Composition (DVC) task to automatically integrate various media elements based on user requirements and create storytelling videos.
We propose an Intelligent Director framework, utilizing LENS to generate descriptions for images and video frames and combining ChatGPT to generate coherent captions.
We construct UCF101-DVC and Personal Album datasets and verified the effectiveness of our framework.
arXiv Detail & Related papers (2024-02-24T06:58:15Z) - MusicRL: Aligning Music Generation to Human Preferences [62.44903326718772]
MusicRL is the first music generation system finetuned from human feedback.
We deploy MusicLM to users and collect a substantial dataset comprising 300,000 pairwise preferences.
We train MusicRL-U, the first text-to-music model that incorporates human feedback at scale.
arXiv Detail & Related papers (2024-02-06T18:36:52Z) - Fairness Through Domain Awareness: Mitigating Popularity Bias For Music
Discovery [56.77435520571752]
We explore the intrinsic relationship between music discovery and popularity bias.
We propose a domain-aware, individual fairness-based approach which addresses popularity bias in graph neural network (GNNs) based recommender systems.
Our approach uses individual fairness to reflect a ground truth listening experience, i.e., if two songs sound similar, this similarity should be reflected in their representations.
arXiv Detail & Related papers (2023-08-28T14:12:25Z) - Language-Guided Music Recommendation for Video via Prompt Analogies [35.48998901411509]
We propose a method to recommend music for an input video while allowing a user to guide music selection with free-form natural language.
Existing music video datasets provide the needed (video, music) training pairs, but lack text descriptions of the music.
arXiv Detail & Related papers (2023-06-15T17:58:01Z) - VideoChat: Chat-Centric Video Understanding [80.63932941216129]
We develop an end-to-end chat-centric video understanding system, coined as VideoChat.
It integrates video foundation models and large language models via a learnable neural interface.
Preliminary qualitative experiments demonstrate the potential of our system across a broad spectrum of video applications.
arXiv Detail & Related papers (2023-05-10T17:59:04Z) - Explainability in Music Recommender Systems [69.0506502017444]
We discuss how explainability can be addressed in the context of Music Recommender Systems (MRSs)
MRSs are often quite complex and optimized for recommendation accuracy.
We show how explainability components can be integrated within a MRS and in what form explanations can be provided.
arXiv Detail & Related papers (2022-01-25T18:32:11Z) - Self-Supervised Bot Play for Conversational Recommendation with
Justifications [3.015622397986615]
We develop a new two-part framework for training conversational recommender systems.
First, we train a recommender system to jointly suggest items and justify its reasoning with subjective aspects.
We then fine-tune this model to incorporate iterative user feedback via self-supervised bot-play.
arXiv Detail & Related papers (2021-12-09T20:07:41Z) - Time-Aware Music Recommender Systems: Modeling the Evolution of Implicit
User Preferences and User Listening Habits in A Collaborative Filtering
Approach [4.576379639081977]
This paper studies the temporal information regarding when songs are played.
The purpose is to model both the evolution of user preferences in the form of evolving implicit ratings and user listening behavior.
In the collaborative filtering method proposed in this work, daily listening habits are captured in order to characterize users and provide them with more reliable recommendations.
arXiv Detail & Related papers (2020-08-26T08:00:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.