Related papers: Learning the Optimal Recommendation from Explorative Users

Learning the Optimal Recommendation from Explorative Users

URL: http://arxiv.org/abs/2110.03068v1
Date: Wed, 6 Oct 2021 21:01:18 GMT
Title: Learning the Optimal Recommendation from Explorative Users
Authors: Fan Yao, Chuanhao Li, Denis Nekipelov, Hongning Wang, Haifeng Xu
Abstract summary: We study the sequential interactions between a recommender system and a user. We show that efficient system learning is still possible but is more difficult.
Score: 38.332330484187395
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a new problem setting to study the sequential interactions between a recommender system and a user. Instead of assuming the user is omniscient, static, and explicit, as the classical practice does, we sketch a more realistic user behavior model, under which the user: 1) rejects recommendations if they are clearly worse than others; 2) updates her utility estimation based on rewards from her accepted recommendations; 3) withholds realized rewards from the system. We formulate the interactions between the system and such an explorative user in a $K$-armed bandit framework and study the problem of learning the optimal recommendation on the system side. We show that efficient system learning is still possible but is more difficult. In particular, the system can identify the best arm with probability at least $1-\delta$ within $O(1/\delta)$ interactions, and we prove this is tight. Our finding contrasts the result for the problem of best arm identification with fixed confidence, in which the best arm can be identified with probability $1-\delta$ within $O(\log(1/\delta))$ interactions. This gap illustrates the inevitable cost the system has to pay when it learns from an explorative user's revealed preferences on its recommendations rather than from the realized rewards.

Related papers

The Nah Bandit: Modeling User Non-compliance in Recommendation Systems [2.421459418045937]
Expert with Clustering (EWC) is a hierarchical approach that incorporates feedback from both recommended and non-recommended options to accelerate user preference learning. EWC outperforms both supervised learning and traditional contextual bandit approaches. This work lays the foundation for future research in Nah Bandit, providing a robust framework for more effective recommendation systems.
arXiv Detail & Related papers (2024-08-15T03:01:02Z)
Incentive-Aware Recommender Systems in Two-Sided Markets [49.692453629365204]
We propose a novel recommender system that aligns with agents' incentives while achieving myopically optimal performance. Our framework models this incentive-aware system as a multi-agent bandit problem in two-sided markets. Both algorithms satisfy an ex-post fairness criterion, which protects agents from over-exploitation.
arXiv Detail & Related papers (2022-11-23T22:20:12Z)
Modeling Attrition in Recommender Systems with Departing Bandits [84.85560764274399]
We propose a novel multi-armed bandit setup that captures policy-dependent horizons. We first address the case where all users share the same type, demonstrating that a recent UCB-based algorithm is optimal. We then move forward to the more challenging case, where users are divided among two types.
arXiv Detail & Related papers (2022-03-25T02:30:54Z)
Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms [75.17357040707347]
Motivated by online recommendation systems, we propose the problem of finding the optimal policy in contextual bandits. The goal is to robustly learn the policy that maximizes rewards for good users with as few user interactions as possible. We show we can achieve an $tildeO(min(S,A)cdot alpha/epsilon2)$ upper-bound, by employing efficient robust mean estimators.
arXiv Detail & Related papers (2022-01-30T01:45:13Z)
Pure Exploration with Structured Preference Feedback [25.894827160719526]
We consider the problem of pure exploration with subset-wise preference feedback, which contains $N$ arms with features. We present two algorithms that guarantee the detection of the best-arm in $tildeO (fracd2K Delta2)$ samples with probability at least $1.
arXiv Detail & Related papers (2021-04-12T08:57:29Z)
Measuring Recommender System Effects with Simulated Users [19.09065424910035]
Popularity bias and filter bubbles are two of the most well-studied recommender system biases. We offer a simulation framework for measuring the impact of a recommender system under different types of user behavior.
arXiv Detail & Related papers (2021-01-12T14:51:11Z)
Regret in Online Recommendation Systems [73.58127515175127]
This paper proposes a theoretical analysis of recommendation systems in an online setting, where items are sequentially recommended to users over time. In each round, a user, randomly picked from a population of $m$ users, requests a recommendation. The decision-maker observes the user and selects an item from a catalogue of $n$ items. The performance of the recommendation algorithm is captured through its regret, considering as a reference an Oracle algorithm aware of these probabilities.
arXiv Detail & Related papers (2020-10-23T12:48:35Z)
Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback [62.997667081978825]
We present a novel approach for considering user feedback and evaluate it using three distinct strategies. Despite a limited number of feedbacks returned by users (as low as 20% of the total), our approach obtains similar results to those of state of the art approaches.
arXiv Detail & Related papers (2020-09-16T07:32:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.