Related papers: Learning User Preferences in Non-Stationary Environments

Learning User Preferences in Non-Stationary Environments

URL: http://arxiv.org/abs/2101.12506v1
Date: Fri, 29 Jan 2021 10:26:16 GMT
Title: Learning User Preferences in Non-Stationary Environments
Authors: Wasim Huleihel and Soumyabrata Pal and Ofer Shayevitz
Abstract summary: We introduce a novel model for online non-stationary recommendation systems. We show that our algorithm outperforms other static algorithms even when preferences do not change over time.
Score: 42.785926822853746
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recommendation systems often use online collaborative filtering (CF) algorithms to identify items a given user likes over time, based on ratings that this user and a large number of other users have provided in the past. This problem has been studied extensively when users' preferences do not change over time (static case); an assumption that is often violated in practical settings. In this paper, we introduce a novel model for online non-stationary recommendation systems which allows for temporal uncertainties in the users' preferences. For this model, we propose a user-based CF algorithm, and provide a theoretical analysis of its achievable reward. Compared to related non-stationary multi-armed bandit literature, the main fundamental difficulty in our model lies in the fact that variations in the preferences of a certain user may affect the recommendations for other users severely. We also test our algorithm over real-world datasets, showing its effectiveness in real-world applications. One of the main surprising observations in our experiments is the fact our algorithm outperforms other static algorithms even when preferences do not change over time. This hints toward the general conclusion that in practice, dynamic algorithms, such as the one we propose, might be beneficial even in stationary environments.

Related papers

Online Clustering of Dueling Bandits [59.09590979404303]
We introduce the first "clustering of dueling bandit algorithms" to enable collaborative decision-making based on preference feedback. We propose two novel algorithms: (1) Clustering of Linear Dueling Bandits (COLDB) which models the user reward functions as linear functions of the context vectors, and (2) Clustering of Neural Dueling Bandits (CONDB) which uses a neural network to model complex, non-linear user reward functions.
arXiv Detail & Related papers (2025-02-04T07:55:41Z)
When Online Algorithms Influence the Environment: A Dynamical Systems Analysis of the Unintended Consequences [5.4209739979186295]
We analyze the effect that online algorithms have on the environment that they are learning. We show that when the recommendation algorithm is able to learn the population preferences in the presence of this mismatch, the algorithm induces similarity in the preferences of the user population.
arXiv Detail & Related papers (2024-11-21T06:47:53Z)
Quantifying User Coherence: A Unified Framework for Cross-Domain Recommendation Analysis [69.37718774071793]
This paper introduces novel information-theoretic measures for understanding recommender systems. We evaluate 7 recommendation algorithms across 9 datasets, revealing the relationships between our measures and standard performance metrics.
arXiv Detail & Related papers (2024-10-03T13:02:07Z)
Algorithmic Drift: A Simulation Framework to Study the Effects of Recommender Systems on User Preferences [7.552217586057245]
We propose a simulation framework that mimics user-recommender system interactions in a long-term scenario. We introduce two novel metrics for quantifying the algorithm's impact on user preferences, specifically in terms of drift over time.
arXiv Detail & Related papers (2024-09-24T21:54:22Z)
Federated Privacy-preserving Collaborative Filtering for On-Device Next App Prediction [52.16923290335873]
We propose a novel SeqMF model to solve the problem of predicting the next app launch during mobile device usage. We modify the structure of the classical matrix factorization model and update the training procedure to sequential learning. One more ingredient of the proposed approach is a new privacy mechanism that guarantees the protection of the sent data from the users to the remote server.
arXiv Detail & Related papers (2023-02-05T10:29:57Z)
Modeling Dynamic User Preference via Dictionary Learning for Sequential Recommendation [133.8758914874593]
Capturing the dynamics in user preference is crucial to better predict user future behaviors because user preferences often drift over time. Many existing recommendation algorithms -- including both shallow and deep ones -- often model such dynamics independently. This paper considers the problem of embedding a user's sequential behavior into the latent space of user preferences.
arXiv Detail & Related papers (2022-04-02T03:23:46Z)
Top-N Recommendation with Counterfactual User Preference Simulation [26.597102553608348]
Top-N recommendation, which aims to learn user ranking-based preference, has long been a fundamental problem in a wide range of applications. In this paper, we propose to reformulate the recommendation task within the causal inference framework to handle the data scarce problem.
arXiv Detail & Related papers (2021-09-02T14:28:46Z)
Control Variates for Slate Off-Policy Evaluation [112.35528337130118]
We study the problem of off-policy evaluation from batched contextual bandit data with multidimensional actions. We obtain new estimators with risk improvement guarantees over both the PI and self-normalized PI estimators.
arXiv Detail & Related papers (2021-06-15T06:59:53Z)
Non-Stationary Latent Bandits [68.21614490603758]
We propose a practical approach for fast personalization to non-stationary users. The key idea is to frame this problem as a latent bandit, where prototypical models of user behavior are learned offline and the latent state of the user is inferred online. We propose Thompson sampling algorithms for regret minimization in non-stationary latent bandits, analyze them, and evaluate them on a real-world dataset.
arXiv Detail & Related papers (2020-12-01T10:31:57Z)
Optimizing Offer Sets in Sub-Linear Time [5.027714423258537]
We propose an algorithm for personalized offer set optimization that runs in time sub-linear in the number of items. Our algorithm can be entirely data-driven, relying on samples of the user, where a sample' refers to the user interaction data typically collected by firms.
arXiv Detail & Related papers (2020-11-17T13:02:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.