Related papers: Automatic Music Playlist Generation via Simulation-based Reinforcement Learning

Automatic Music Playlist Generation via Simulation-based Reinforcement Learning

URL: http://arxiv.org/abs/2310.09123v1
Date: Fri, 13 Oct 2023 14:13:02 GMT
Title: Automatic Music Playlist Generation via Simulation-based Reinforcement Learning
Authors: Federico Tomasi, Joseph Cauteruccio, Surya Kanoria, Kamil Ciosek, Matteo Rinaldi, Zhenwen Dai
Abstract summary: Personalization of playlists is a common feature in music streaming services. We present a reinforcement learning framework that solves for user satisfaction metrics via the use of a simulated playlist-generation environment.
Score: 17.628525710776877
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Personalization of playlists is a common feature in music streaming services, but conventional techniques, such as collaborative filtering, rely on explicit assumptions regarding content quality to learn how to make recommendations. Such assumptions often result in misalignment between offline model objectives and online user satisfaction metrics. In this paper, we present a reinforcement learning framework that solves for such limitations by directly optimizing for user satisfaction metrics via the use of a simulated playlist-generation environment. Using this simulator we develop and train a modified Deep Q-Network, the action head DQN (AH-DQN), in a manner that addresses the challenges imposed by the large state and action space of our RL formulation. The resulting policy is capable of making recommendations from large and dynamic sets of candidate items with the expectation of maximizing consumption metrics. We analyze and evaluate agents offline via simulations that use environment models trained on both public and proprietary streaming datasets. We show how these agents lead to better user-satisfaction metrics compared to baseline methods during online A/B tests. Finally, we demonstrate that performance assessments produced from our simulator are strongly correlated with observed online metric results.

Related papers

PUB: An LLM-Enhanced Personality-Driven User Behaviour Simulator for Recommender System Evaluation [9.841963696576546]
Personality-driven User Behaviour Simulator (PUB) integrates the Big Five personality traits to model personalised user behaviour.<n>PUB dynamically infers user personality from behavioural logs (e.g., ratings, reviews) and item metadata, then generates synthetic interactions that preserve statistical fidelity to real-world data.<n> Experiments on the Amazon review datasets show that logs generated by PUB closely align with real user behaviour and reveal meaningful associations between personality traits and recommendation outcomes.
arXiv Detail & Related papers (2025-06-05T01:57:36Z)
CreAgent: Towards Long-Term Evaluation of Recommender System under Platform-Creator Information Asymmetry [55.64992650205645]
We propose CreAgent, a large language model-empowered creator simulation agent. By incorporating game theory's belief mechanism and the fast-and-slow thinking framework, CreAgent effectively simulates creator behavior. Our credibility validation experiments show that CreAgent aligns well with the behaviors between real-world platform and creator.
arXiv Detail & Related papers (2025-02-11T07:09:49Z)
Scalable Offline Reinforcement Learning for Mean Field Games [6.8267158622784745]
Off-MMD is a novel mean-field RL algorithm that approximates equilibrium policies in mean-field games using purely offline data. Our algorithm scales to complex environments and demonstrates strong performance on benchmark tasks like crowd exploration or navigation.
arXiv Detail & Related papers (2024-10-23T14:16:34Z)
Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment [104.18002641195442]
We introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing paired data. Building on the self-play concept, which autonomously generates negative responses, we further incorporate an off-policy learning pipeline to enhance data exploration and exploitation.
arXiv Detail & Related papers (2024-05-31T14:21:04Z)
On Generative Agents in Recommendation [58.42840923200071]
Agent4Rec is a user simulator in recommendation based on Large Language Models. Each agent interacts with personalized recommender models in a page-by-page manner.
arXiv Detail & Related papers (2023-10-16T06:41:16Z)
Re-Evaluating LiDAR Scene Flow for Autonomous Driving [80.37947791534985]
Popular benchmarks for self-supervised LiDAR scene flow have unrealistic rates of dynamic motion, unrealistic correspondences, and unrealistic sampling patterns. We evaluate a suite of top methods on a suite of real-world datasets. We show that despite the emphasis placed on learning, most performance gains are caused by pre- and post-processing steps.
arXiv Detail & Related papers (2023-04-04T22:45:50Z)
Federated Privacy-preserving Collaborative Filtering for On-Device Next App Prediction [52.16923290335873]
We propose a novel SeqMF model to solve the problem of predicting the next app launch during mobile device usage. We modify the structure of the classical matrix factorization model and update the training procedure to sequential learning. One more ingredient of the proposed approach is a new privacy mechanism that guarantees the protection of the sent data from the users to the remote server.
arXiv Detail & Related papers (2023-02-05T10:29:57Z)
Sim-Anchored Learning for On-the-Fly Adaptation [45.123633153460034]
Fine-tuning simulation-trained RL agents with real-world data often degrades crucial behaviors due to limited or skewed data distributions. We propose framing live-adaptation as a multi-objective optimization problem, where policy objectives must be satisfied both in simulation and reality.
arXiv Detail & Related papers (2023-01-17T16:16:53Z)
Exploring validation metrics for offline model-based optimisation with diffusion models [50.404829846182764]
In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward with respect to a black box function called the (ground truth) oracle. While an approximation to the ground oracle can be trained and used in place of it during model validation to measure the mean reward over generated candidates, the evaluation is approximate and vulnerable to adversarial examples. This is encapsulated under our proposed evaluation framework which is also designed to measure extrapolation.
arXiv Detail & Related papers (2022-11-19T16:57:37Z)
Towards Data-Driven Offline Simulations for Online Reinforcement Learning [30.654163861164864]
We formalize offline learner simulation (OLS) for reinforcement learning (RL) We propose a novel evaluation protocol that measures both fidelity and efficiency of the simulation.
arXiv Detail & Related papers (2022-11-14T18:36:13Z)
Imitate TheWorld: A Search Engine Simulation Platform [13.011052642314421]
We build a simulated search engine AESim that can properly give feedback by a well-trained discriminator for generated pages. Different from previous simulation platforms which lose connection with the real world, ours depends on the real data in Search. Our experiments also show AESim can better reflect the online performance of ranking models than classic ranking metrics.
arXiv Detail & Related papers (2021-07-16T03:55:33Z)
Do Offline Metrics Predict Online Performance in Recommender Systems? [79.48653445643865]
We investigate the extent to which offline metrics predict online performance by evaluating recommenders across six simulated environments. We observe that offline metrics are correlated with online performance over a range of environments. We study the impact of adding exploration strategies, and observe that their effectiveness, when compared to greedy recommendation, is highly dependent on the recommendation algorithm.
arXiv Detail & Related papers (2020-11-07T01:41:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.