Modeling Attrition in Recommender Systems with Departing Bandits
- URL: http://arxiv.org/abs/2203.13423v2
- Date: Thu, 15 Feb 2024 22:01:00 GMT
- Title: Modeling Attrition in Recommender Systems with Departing Bandits
- Authors: Omer Ben-Porat, Lee Cohen, Liu Leqi, Zachary C. Lipton, Yishay Mansour
- Abstract summary: We propose a novel multi-armed bandit setup that captures policy-dependent horizons.
We first address the case where all users share the same type, demonstrating that a recent UCB-based algorithm is optimal.
We then move forward to the more challenging case, where users are divided among two types.
- Score: 84.85560764274399
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditionally, when recommender systems are formalized as multi-armed
bandits, the policy of the recommender system influences the rewards accrued,
but not the length of interaction. However, in real-world systems, dissatisfied
users may depart (and never come back). In this work, we propose a novel
multi-armed bandit setup that captures such policy-dependent horizons. Our
setup consists of a finite set of user types, and multiple arms with Bernoulli
payoffs. Each (user type, arm) tuple corresponds to an (unknown) reward
probability. Each user's type is initially unknown and can only be inferred
through their response to recommendations. Moreover, if a user is dissatisfied
with their recommendation, they might depart the system. We first address the
case where all users share the same type, demonstrating that a recent UCB-based
algorithm is optimal. We then move forward to the more challenging case, where
users are divided among two types. While naive approaches cannot handle this
setting, we provide an efficient learning algorithm that achieves
$\tilde{O}(\sqrt{T})$ regret, where $T$ is the number of users.
Related papers
- The Nah Bandit: Modeling User Non-compliance in Recommendation Systems [2.421459418045937]
Expert with Clustering (EWC) is a hierarchical approach that incorporates feedback from both recommended and non-recommended options to accelerate user preference learning.
EWC outperforms both supervised learning and traditional contextual bandit approaches.
This work lays the foundation for future research in Nah Bandit, providing a robust framework for more effective recommendation systems.
arXiv Detail & Related papers (2024-08-15T03:01:02Z) - Incentive-Aware Recommender Systems in Two-Sided Markets [49.692453629365204]
We propose a novel recommender system that aligns with agents' incentives while achieving myopically optimal performance.
Our framework models this incentive-aware system as a multi-agent bandit problem in two-sided markets.
Both algorithms satisfy an ex-post fairness criterion, which protects agents from over-exploitation.
arXiv Detail & Related papers (2022-11-23T22:20:12Z) - Incentivizing Combinatorial Bandit Exploration [87.08827496301839]
Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system.
Users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations.
While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users.
arXiv Detail & Related papers (2022-06-01T13:46:25Z) - Coordinated Attacks against Contextual Bandits: Fundamental Limits and
Defense Mechanisms [75.17357040707347]
Motivated by online recommendation systems, we propose the problem of finding the optimal policy in contextual bandits.
The goal is to robustly learn the policy that maximizes rewards for good users with as few user interactions as possible.
We show we can achieve an $tildeO(min(S,A)cdot alpha/epsilon2)$ upper-bound, by employing efficient robust mean estimators.
arXiv Detail & Related papers (2022-01-30T01:45:13Z) - Learning the Optimal Recommendation from Explorative Users [38.332330484187395]
We study the sequential interactions between a recommender system and a user.
We show that efficient system learning is still possible but is more difficult.
arXiv Detail & Related papers (2021-10-06T21:01:18Z) - BanditMF: Multi-Armed Bandit Based Matrix Factorization Recommender
System [0.0]
Multi-armed bandits (MAB) provide a principled online learning approach to attain the balance between exploration and exploitation.
collaborative filtering (CF) is arguably the earliest and most influential method in the recommender system.
BanditMF is designed to address two challenges in the multi-armed bandits algorithm and collaborative filtering.
arXiv Detail & Related papers (2021-06-21T07:35:39Z) - Measuring Recommender System Effects with Simulated Users [19.09065424910035]
Popularity bias and filter bubbles are two of the most well-studied recommender system biases.
We offer a simulation framework for measuring the impact of a recommender system under different types of user behavior.
arXiv Detail & Related papers (2021-01-12T14:51:11Z) - Regret in Online Recommendation Systems [73.58127515175127]
This paper proposes a theoretical analysis of recommendation systems in an online setting, where items are sequentially recommended to users over time.
In each round, a user, randomly picked from a population of $m$ users, requests a recommendation. The decision-maker observes the user and selects an item from a catalogue of $n$ items.
The performance of the recommendation algorithm is captured through its regret, considering as a reference an Oracle algorithm aware of these probabilities.
arXiv Detail & Related papers (2020-10-23T12:48:35Z) - Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users'
Feedback [62.997667081978825]
We present a novel approach for considering user feedback and evaluate it using three distinct strategies.
Despite a limited number of feedbacks returned by users (as low as 20% of the total), our approach obtains similar results to those of state of the art approaches.
arXiv Detail & Related papers (2020-09-16T07:32:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.