Preferences Evolve And So Should Your Bandits: Bandits with Evolving
States for Online Platforms
- URL: http://arxiv.org/abs/2307.11655v3
- Date: Mon, 19 Feb 2024 14:55:52 GMT
- Title: Preferences Evolve And So Should Your Bandits: Bandits with Evolving
States for Online Platforms
- Authors: Khashayar Khosravi, Renato Paes Leme, Chara Podimata, and Apostolis
Tsorvantzis
- Abstract summary: We propose a model for learning with bandit feedback while accounting for deterministically evolving and unobservable states.
The workhorse applications of our model are learning for recommendation systems and learning for online ads.
- Score: 12.368291979686122
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a model for learning with bandit feedback while accounting for
deterministically evolving and unobservable states that we call Bandits with
Deterministically Evolving States ($B$-$DES$). The workhorse applications of
our model are learning for recommendation systems and learning for online ads.
In both cases, the reward that the algorithm obtains at each round is a
function of the short-term reward of the action chosen and how "healthy" the
system is (i.e., as measured by its state). For example, in recommendation
systems, the reward that the platform obtains from a user's engagement with a
particular type of content depends not only on the inherent features of the
specific content, but also on how the user's preferences have evolved as a
result of interacting with other types of content on the platform. Our general
model accounts for the different rate $\lambda \in [0,1]$ at which the state
evolves (e.g., how fast a user's preferences shift as a result of previous
content consumption) and encompasses standard multi-armed bandits as a special
case. The goal of the algorithm is to minimize a notion of regret against the
best-fixed sequence of arms pulled, which is significantly harder to attain
compared to standard benchmark of the best-fixed action in hindsight. We
present online learning algorithms for any possible value of the evolution rate
$\lambda$ and we show the robustness of our results to various model
misspecifications.
Related papers
- Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting [67.1631453378926]
Graph-Triggered Bandits is a framework to generalize rested and restless bandits.
In this work, we focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs.
arXiv Detail & Related papers (2024-09-09T18:23:07Z) - Adaptively Learning to Select-Rank in Online Platforms [34.258659206323664]
This research addresses the challenge of adaptively ranking items from a candidate pool for heterogeneous users.
We develop a user response model that considers diverse user preferences and the varying effects of item positions.
Experiments conducted on both simulated and real-world datasets demonstrate our algorithm outperforms the baseline.
arXiv Detail & Related papers (2024-06-07T15:33:48Z) - Sparsity-Agnostic Linear Bandits with Adaptive Adversaries [19.84322270472381]
We study linear bandits where, in each round, the learner receives a set of actions (i.e., feature vectors) from which it chooses an element and obtains a reward.
The expected reward is a fixed but unknown linear function of the chosen action.
We study sparse regret bounds, that depend on the number $S$ of non-zero coefficients in the linear reward function.
arXiv Detail & Related papers (2024-06-03T10:54:58Z) - Bandits Meet Mechanism Design to Combat Clickbait in Online
Recommendation [50.469872635246176]
We study a strategic variant of the multi-armed bandit problem, which we coin the strategic click-bandit.
This model is motivated by applications in online recommendation where the choice of recommended items depends on both the click-through rates and the post-click rewards.
arXiv Detail & Related papers (2023-11-27T09:19:01Z) - Anytime Model Selection in Linear Bandits [61.97047189786905]
We develop ALEXP, which has an exponentially improved dependence on $M$ for its regret.
Our approach utilizes a novel time-uniform analysis of the Lasso, establishing a new connection between online learning and high-dimensional statistics.
arXiv Detail & Related papers (2023-07-24T15:44:30Z) - Improving Recommendation System Serendipity Through Lexicase Selection [53.57498970940369]
We propose a new serendipity metric to measure the presence of echo chambers and homophily in recommendation systems.
We then attempt to improve the diversity-preservation qualities of well known recommendation techniques by adopting a parent selection algorithm known as lexicase selection.
Our results show that lexicase selection, or a mixture of lexicase selection and ranking, outperforms its purely ranked counterparts in terms of personalization, coverage and our specifically designed serendipity benchmark.
arXiv Detail & Related papers (2023-05-18T15:37:38Z) - Incentive-Aware Recommender Systems in Two-Sided Markets [49.692453629365204]
We propose a novel recommender system that aligns with agents' incentives while achieving myopically optimal performance.
Our framework models this incentive-aware system as a multi-agent bandit problem in two-sided markets.
Both algorithms satisfy an ex-post fairness criterion, which protects agents from over-exploitation.
arXiv Detail & Related papers (2022-11-23T22:20:12Z) - Modeling Attrition in Recommender Systems with Departing Bandits [84.85560764274399]
We propose a novel multi-armed bandit setup that captures policy-dependent horizons.
We first address the case where all users share the same type, demonstrating that a recent UCB-based algorithm is optimal.
We then move forward to the more challenging case, where users are divided among two types.
arXiv Detail & Related papers (2022-03-25T02:30:54Z) - Incentivising Exploration and Recommendations for Contextual Bandits
with Payments [2.5966580648312223]
We show how the platform can learn the inherent attributes of items and achieve a sublinear regret while maximizing cumulative social welfare.
Our approach can improve various engagement metrics of users on e-commerce stores, recommendation engines and matching platforms.
arXiv Detail & Related papers (2020-01-22T02:26:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.