Related papers: Preferences Evolve And So Should Your Bandits: Bandits with Evolving States for Online Platforms

Preferences Evolve And So Should Your Bandits: Bandits with Evolving States for Online Platforms

URL: http://arxiv.org/abs/2307.11655v3
Date: Mon, 19 Feb 2024 14:55:52 GMT
Title: Preferences Evolve And So Should Your Bandits: Bandits with Evolving States for Online Platforms
Authors: Khashayar Khosravi, Renato Paes Leme, Chara Podimata, and Apostolis Tsorvantzis
Abstract summary: We propose a model for learning with bandit feedback while accounting for deterministically evolving and unobservable states. The workhorse applications of our model are learning for recommendation systems and learning for online ads.
Score: 12.368291979686122
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a model for learning with bandit feedback while accounting for deterministically evolving and unobservable states that we call Bandits with Deterministically Evolving States ($B$-$DES$). The workhorse applications of our model are learning for recommendation systems and learning for online ads. In both cases, the reward that the algorithm obtains at each round is a function of the short-term reward of the action chosen and how "healthy" the system is (i.e., as measured by its state). For example, in recommendation systems, the reward that the platform obtains from a user's engagement with a particular type of content depends not only on the inherent features of the specific content, but also on how the user's preferences have evolved as a result of interacting with other types of content on the platform. Our general model accounts for the different rate $\lambda \in [0,1]$ at which the state evolves (e.g., how fast a user's preferences shift as a result of previous content consumption) and encompasses standard multi-armed bandits as a special case. The goal of the algorithm is to minimize a notion of regret against the best-fixed sequence of arms pulled, which is significantly harder to attain compared to standard benchmark of the best-fixed action in hindsight. We present online learning algorithms for any possible value of the evolution rate $\lambda$ and we show the robustness of our results to various model misspecifications.

Related papers

Algorithmic Content Selection and the Impact of User Disengagement [19.14804091327051]
Digital services face a fundamental trade-off in content selection. They must balance the immediate revenue gained from high-reward content against the long-term benefits of maintaining user engagement.
arXiv Detail & Related papers (2024-10-17T00:43:06Z)
Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting [67.1631453378926]
Graph-Triggered Bandits is a framework to generalize rested and restless bandits. In this work, we focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs.
arXiv Detail & Related papers (2024-09-09T18:23:07Z)
Adaptively Learning to Select-Rank in Online Platforms [34.258659206323664]
This research addresses the challenge of adaptively ranking items from a candidate pool for heterogeneous users. We develop a user response model that considers diverse user preferences and the varying effects of item positions. Experiments conducted on both simulated and real-world datasets demonstrate our algorithm outperforms the baseline.
arXiv Detail & Related papers (2024-06-07T15:33:48Z)
Sparsity-Agnostic Linear Bandits with Adaptive Adversaries [19.84322270472381]
We study linear bandits where, in each round, the learner receives a set of actions (i.e., feature vectors) from which it chooses an element and obtains a reward. The expected reward is a fixed but unknown linear function of the chosen action. We study sparse regret bounds, that depend on the number $S$ of non-zero coefficients in the linear reward function.
arXiv Detail & Related papers (2024-06-03T10:54:58Z)
Bandits Meet Mechanism Design to Combat Clickbait in Online Recommendation [50.469872635246176]
We study a strategic variant of the multi-armed bandit problem, which we coin the strategic click-bandit. This model is motivated by applications in online recommendation where the choice of recommended items depends on both the click-through rates and the post-click rewards.
arXiv Detail & Related papers (2023-11-27T09:19:01Z)
Anytime Model Selection in Linear Bandits [61.97047189786905]
We develop ALEXP, which has an exponentially improved dependence on $M$ for its regret. Our approach utilizes a novel time-uniform analysis of the Lasso, establishing a new connection between online learning and high-dimensional statistics.
arXiv Detail & Related papers (2023-07-24T15:44:30Z)
Improving Recommendation System Serendipity Through Lexicase Selection [53.57498970940369]
We propose a new serendipity metric to measure the presence of echo chambers and homophily in recommendation systems. We then attempt to improve the diversity-preservation qualities of well known recommendation techniques by adopting a parent selection algorithm known as lexicase selection. Our results show that lexicase selection, or a mixture of lexicase selection and ranking, outperforms its purely ranked counterparts in terms of personalization, coverage and our specifically designed serendipity benchmark.
arXiv Detail & Related papers (2023-05-18T15:37:38Z)
Incentive-Aware Recommender Systems in Two-Sided Markets [49.692453629365204]
We propose a novel recommender system that aligns with agents' incentives while achieving myopically optimal performance. Our framework models this incentive-aware system as a multi-agent bandit problem in two-sided markets. Both algorithms satisfy an ex-post fairness criterion, which protects agents from over-exploitation.
arXiv Detail & Related papers (2022-11-23T22:20:12Z)
Modeling Attrition in Recommender Systems with Departing Bandits [84.85560764274399]
We propose a novel multi-armed bandit setup that captures policy-dependent horizons. We first address the case where all users share the same type, demonstrating that a recent UCB-based algorithm is optimal. We then move forward to the more challenging case, where users are divided among two types.
arXiv Detail & Related papers (2022-03-25T02:30:54Z)
Incentivising Exploration and Recommendations for Contextual Bandits with Payments [2.5966580648312223]
We show how the platform can learn the inherent attributes of items and achieve a sublinear regret while maximizing cumulative social welfare. Our approach can improve various engagement metrics of users on e-commerce stores, recommendation engines and matching platforms.
arXiv Detail & Related papers (2020-01-22T02:26:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.