Automatic Music Playlist Generation via Simulation-based Reinforcement
Learning
- URL: http://arxiv.org/abs/2310.09123v1
- Date: Fri, 13 Oct 2023 14:13:02 GMT
- Title: Automatic Music Playlist Generation via Simulation-based Reinforcement
Learning
- Authors: Federico Tomasi, Joseph Cauteruccio, Surya Kanoria, Kamil Ciosek,
Matteo Rinaldi, Zhenwen Dai
- Abstract summary: Personalization of playlists is a common feature in music streaming services.
We present a reinforcement learning framework that solves for user satisfaction metrics via the use of a simulated playlist-generation environment.
- Score: 17.628525710776877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Personalization of playlists is a common feature in music streaming services,
but conventional techniques, such as collaborative filtering, rely on explicit
assumptions regarding content quality to learn how to make recommendations.
Such assumptions often result in misalignment between offline model objectives
and online user satisfaction metrics. In this paper, we present a reinforcement
learning framework that solves for such limitations by directly optimizing for
user satisfaction metrics via the use of a simulated playlist-generation
environment. Using this simulator we develop and train a modified Deep
Q-Network, the action head DQN (AH-DQN), in a manner that addresses the
challenges imposed by the large state and action space of our RL formulation.
The resulting policy is capable of making recommendations from large and
dynamic sets of candidate items with the expectation of maximizing consumption
metrics. We analyze and evaluate agents offline via simulations that use
environment models trained on both public and proprietary streaming datasets.
We show how these agents lead to better user-satisfaction metrics compared to
baseline methods during online A/B tests. Finally, we demonstrate that
performance assessments produced from our simulator are strongly correlated
with observed online metric results.
Related papers
- CreAgent: Towards Long-Term Evaluation of Recommender System under Platform-Creator Information Asymmetry [55.64992650205645]
We propose CreAgent, a large language model-empowered creator simulation agent.
By incorporating game theory's belief mechanism and the fast-and-slow thinking framework, CreAgent effectively simulates creator behavior.
Our credibility validation experiments show that CreAgent aligns well with the behaviors between real-world platform and creator.
arXiv Detail & Related papers (2025-02-11T07:09:49Z) - Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment [104.18002641195442]
We introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing paired data.
Building on the self-play concept, which autonomously generates negative responses, we further incorporate an off-policy learning pipeline to enhance data exploration and exploitation.
arXiv Detail & Related papers (2024-05-31T14:21:04Z) - Lusifer: LLM-based User SImulated Feedback Environment for online Recommender systems [0.0]
We introduce Lusifer, a novel environment leveraging Large Language Models (LLMs) to generate simulated user feedback.
Lusifer synthesizes user profiles and interaction histories to simulate responses and behaviors toward recommended items.
Lusifer accurately emulates user behavior and preferences, even with reduced training data having an RMSE of 1.3.
arXiv Detail & Related papers (2024-05-22T05:43:15Z) - On Generative Agents in Recommendation [58.42840923200071]
Agent4Rec is a user simulator in recommendation based on Large Language Models.
Each agent interacts with personalized recommender models in a page-by-page manner.
arXiv Detail & Related papers (2023-10-16T06:41:16Z) - Federated Privacy-preserving Collaborative Filtering for On-Device Next
App Prediction [52.16923290335873]
We propose a novel SeqMF model to solve the problem of predicting the next app launch during mobile device usage.
We modify the structure of the classical matrix factorization model and update the training procedure to sequential learning.
One more ingredient of the proposed approach is a new privacy mechanism that guarantees the protection of the sent data from the users to the remote server.
arXiv Detail & Related papers (2023-02-05T10:29:57Z) - Exploring validation metrics for offline model-based optimisation with
diffusion models [50.404829846182764]
In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward with respect to a black box function called the (ground truth) oracle.
While an approximation to the ground oracle can be trained and used in place of it during model validation to measure the mean reward over generated candidates, the evaluation is approximate and vulnerable to adversarial examples.
This is encapsulated under our proposed evaluation framework which is also designed to measure extrapolation.
arXiv Detail & Related papers (2022-11-19T16:57:37Z) - Towards Data-Driven Offline Simulations for Online Reinforcement
Learning [30.654163861164864]
We formalize offline learner simulation (OLS) for reinforcement learning (RL)
We propose a novel evaluation protocol that measures both fidelity and efficiency of the simulation.
arXiv Detail & Related papers (2022-11-14T18:36:13Z) - Imitate TheWorld: A Search Engine Simulation Platform [13.011052642314421]
We build a simulated search engine AESim that can properly give feedback by a well-trained discriminator for generated pages.
Different from previous simulation platforms which lose connection with the real world, ours depends on the real data in Search.
Our experiments also show AESim can better reflect the online performance of ranking models than classic ranking metrics.
arXiv Detail & Related papers (2021-07-16T03:55:33Z) - Do Offline Metrics Predict Online Performance in Recommender Systems? [79.48653445643865]
We investigate the extent to which offline metrics predict online performance by evaluating recommenders across six simulated environments.
We observe that offline metrics are correlated with online performance over a range of environments.
We study the impact of adding exploration strategies, and observe that their effectiveness, when compared to greedy recommendation, is highly dependent on the recommendation algorithm.
arXiv Detail & Related papers (2020-11-07T01:41:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.