Automatic Music Playlist Generation via Simulation-based Reinforcement
Learning
- URL: http://arxiv.org/abs/2310.09123v1
- Date: Fri, 13 Oct 2023 14:13:02 GMT
- Title: Automatic Music Playlist Generation via Simulation-based Reinforcement
Learning
- Authors: Federico Tomasi, Joseph Cauteruccio, Surya Kanoria, Kamil Ciosek,
Matteo Rinaldi, Zhenwen Dai
- Abstract summary: Personalization of playlists is a common feature in music streaming services.
We present a reinforcement learning framework that solves for user satisfaction metrics via the use of a simulated playlist-generation environment.
- Score: 17.628525710776877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Personalization of playlists is a common feature in music streaming services,
but conventional techniques, such as collaborative filtering, rely on explicit
assumptions regarding content quality to learn how to make recommendations.
Such assumptions often result in misalignment between offline model objectives
and online user satisfaction metrics. In this paper, we present a reinforcement
learning framework that solves for such limitations by directly optimizing for
user satisfaction metrics via the use of a simulated playlist-generation
environment. Using this simulator we develop and train a modified Deep
Q-Network, the action head DQN (AH-DQN), in a manner that addresses the
challenges imposed by the large state and action space of our RL formulation.
The resulting policy is capable of making recommendations from large and
dynamic sets of candidate items with the expectation of maximizing consumption
metrics. We analyze and evaluate agents offline via simulations that use
environment models trained on both public and proprietary streaming datasets.
We show how these agents lead to better user-satisfaction metrics compared to
baseline methods during online A/B tests. Finally, we demonstrate that
performance assessments produced from our simulator are strongly correlated
with observed online metric results.
Related papers
- Scalable Offline Reinforcement Learning for Mean Field Games [6.8267158622784745]
Off-MMD is a novel mean-field RL algorithm that approximates equilibrium policies in mean-field games using purely offline data.
Our algorithm scales to complex environments and demonstrates strong performance on benchmark tasks like crowd exploration or navigation.
arXiv Detail & Related papers (2024-10-23T14:16:34Z) - Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment [104.18002641195442]
We introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing paired data.
Building on the self-play concept, which autonomously generates negative responses, we further incorporate an off-policy learning pipeline to enhance data exploration and exploitation.
arXiv Detail & Related papers (2024-05-31T14:21:04Z) - On Generative Agents in Recommendation [58.42840923200071]
Agent4Rec is a user simulator in recommendation based on Large Language Models.
Each agent interacts with personalized recommender models in a page-by-page manner.
arXiv Detail & Related papers (2023-10-16T06:41:16Z) - Re-Evaluating LiDAR Scene Flow for Autonomous Driving [80.37947791534985]
Popular benchmarks for self-supervised LiDAR scene flow have unrealistic rates of dynamic motion, unrealistic correspondences, and unrealistic sampling patterns.
We evaluate a suite of top methods on a suite of real-world datasets.
We show that despite the emphasis placed on learning, most performance gains are caused by pre- and post-processing steps.
arXiv Detail & Related papers (2023-04-04T22:45:50Z) - Federated Privacy-preserving Collaborative Filtering for On-Device Next
App Prediction [52.16923290335873]
We propose a novel SeqMF model to solve the problem of predicting the next app launch during mobile device usage.
We modify the structure of the classical matrix factorization model and update the training procedure to sequential learning.
One more ingredient of the proposed approach is a new privacy mechanism that guarantees the protection of the sent data from the users to the remote server.
arXiv Detail & Related papers (2023-02-05T10:29:57Z) - Exploring validation metrics for offline model-based optimisation with
diffusion models [50.404829846182764]
In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward with respect to a black box function called the (ground truth) oracle.
While an approximation to the ground oracle can be trained and used in place of it during model validation to measure the mean reward over generated candidates, the evaluation is approximate and vulnerable to adversarial examples.
This is encapsulated under our proposed evaluation framework which is also designed to measure extrapolation.
arXiv Detail & Related papers (2022-11-19T16:57:37Z) - Towards Data-Driven Offline Simulations for Online Reinforcement
Learning [30.654163861164864]
We formalize offline learner simulation (OLS) for reinforcement learning (RL)
We propose a novel evaluation protocol that measures both fidelity and efficiency of the simulation.
arXiv Detail & Related papers (2022-11-14T18:36:13Z) - Imitate TheWorld: A Search Engine Simulation Platform [13.011052642314421]
We build a simulated search engine AESim that can properly give feedback by a well-trained discriminator for generated pages.
Different from previous simulation platforms which lose connection with the real world, ours depends on the real data in Search.
Our experiments also show AESim can better reflect the online performance of ranking models than classic ranking metrics.
arXiv Detail & Related papers (2021-07-16T03:55:33Z) - Do Offline Metrics Predict Online Performance in Recommender Systems? [79.48653445643865]
We investigate the extent to which offline metrics predict online performance by evaluating recommenders across six simulated environments.
We observe that offline metrics are correlated with online performance over a range of environments.
We study the impact of adding exploration strategies, and observe that their effectiveness, when compared to greedy recommendation, is highly dependent on the recommendation algorithm.
arXiv Detail & Related papers (2020-11-07T01:41:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.