Misalignment, Learning, and Ranking: Harnessing Users Limited Attention
- URL: http://arxiv.org/abs/2402.14013v1
- Date: Wed, 21 Feb 2024 18:52:20 GMT
- Title: Misalignment, Learning, and Ranking: Harnessing Users Limited Attention
- Authors: Arpit Agarwal, Rad Niazadeh, Prathamesh Patil
- Abstract summary: We study the design of online algorithms that obtain vanishing regret against optimal benchmarks.
We show how standard algorithms for adversarial online linear optimization can be used to obtain a payoff-time algorithm with $O(sqrtT)$ regret.
- Score: 16.74322664734553
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In digital health and EdTech, recommendation systems face a significant
challenge: users often choose impulsively, in ways that conflict with the
platform's long-term payoffs. This misalignment makes it difficult to
effectively learn to rank items, as it may hinder exploration of items with
greater long-term payoffs. Our paper tackles this issue by utilizing users'
limited attention spans. We propose a model where a platform presents items
with unknown payoffs to the platform in a ranked list to $T$ users over time.
Each user selects an item by first considering a prefix window of these ranked
items and then picking the highest preferred item in that window (and the
platform observes its payoff for this item). We study the design of online
bandit algorithms that obtain vanishing regret against hindsight optimal
benchmarks.
We first consider adversarial window sizes and stochastic iid payoffs. We
design an active-elimination-based algorithm that achieves an optimal
instance-dependent regret bound of $O(\log(T))$, by showing matching regret
upper and lower bounds. The key idea is using the combinatorial structure of
the problem to either obtain a large payoff from each item or to explore by
getting a sample from that item. This method systematically narrows down the
item choices to enhance learning efficiency and payoff.
Second, we consider adversarial payoffs and stochastic iid window sizes. We
start from the full-information problem of finding the permutation that
maximizes the expected payoff. By a novel combinatorial argument, we
characterize the polytope of admissible item selection probabilities by a
permutation and show it has a polynomial-size representation. Using this
representation, we show how standard algorithms for adversarial online linear
optimization in the space of admissible probabilities can be used to obtain a
polynomial-time algorithm with $O(\sqrt{T})$ regret.
Related papers
- Adaptively Learning to Select-Rank in Online Platforms [34.258659206323664]
This research addresses the challenge of adaptively ranking items from a candidate pool for heterogeneous users.
We develop a user response model that considers diverse user preferences and the varying effects of item positions.
Experiments conducted on both simulated and real-world datasets demonstrate our algorithm outperforms the baseline.
arXiv Detail & Related papers (2024-06-07T15:33:48Z) - Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)
We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.
We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - Efficient and Accurate Top-$K$ Recovery from Choice Data [1.14219428942199]
In some applications such as recommendation systems, the statistician is primarily interested in recovering the set of the top ranked items from a large pool of items.
We propose the choice-based Borda count algorithm as a fast and accurate ranking algorithm for top $K$-recovery.
We show that the choice-based Borda count algorithm has optimal sample complexity for top-$K$ recovery under a broad class of random utility models.
arXiv Detail & Related papers (2022-06-23T22:05:08Z) - Efficient and Optimal Algorithms for Contextual Dueling Bandits under
Realizability [59.81339109121384]
We study the $K$ contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes emphpreference-based feedback suggesting that one decision was better than the other.
We provide a new algorithm that achieves the optimal regret rate for a new notion of best response regret, which is a strictly stronger performance measure than those considered in prior works.
arXiv Detail & Related papers (2021-11-24T07:14:57Z) - Multinomial Logit Contextual Bandits: Provable Optimality and
Practicality [15.533842336139063]
We consider a sequential assortment selection problem where the user choice is given by a multinomial logit (MNL) choice model whose parameters are unknown.
We propose upper confidence bound based algorithms for this MNL contextual bandit.
We show that a simple variant of the algorithm achieves the optimal regret for a broad class of important applications.
arXiv Detail & Related papers (2021-03-25T15:42:25Z) - A Tractable Online Learning Algorithm for the Multinomial Logit Contextual Bandit [2.9998316151418107]
We consider a dynamic set optimization problem, where a decision-maker offers a subset of products to a consumer.
We model consumer choice behavior using the widely used Multinomial Logit (MNL) model.
We show that the regret is bounded by $O(sqrtdT + kappa)$, significantly improving the performance over existing methods.
arXiv Detail & Related papers (2020-11-28T00:20:36Z) - Regret in Online Recommendation Systems [73.58127515175127]
This paper proposes a theoretical analysis of recommendation systems in an online setting, where items are sequentially recommended to users over time.
In each round, a user, randomly picked from a population of $m$ users, requests a recommendation. The decision-maker observes the user and selects an item from a catalogue of $n$ items.
The performance of the recommendation algorithm is captured through its regret, considering as a reference an Oracle algorithm aware of these probabilities.
arXiv Detail & Related papers (2020-10-23T12:48:35Z) - Learning to Rank under Multinomial Logit Choice [6.929312022493406]
Learning the optimal ordering of content is an important challenge in website design.
We present theoretical analysis leading to an $Omega(sqrtJT)$ lower bound for the problem, and an $tildeO(sqrtJT)$ upper bound on regret of the UCB algorithm.
arXiv Detail & Related papers (2020-09-07T16:15:12Z) - Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers.
We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z) - SetRank: A Setwise Bayesian Approach for Collaborative Ranking from
Implicit Feedback [50.13745601531148]
We propose a novel setwise Bayesian approach for collaborative ranking, namely SetRank, to accommodate the characteristics of implicit feedback in recommender system.
Specifically, SetRank aims at maximizing the posterior probability of novel setwise preference comparisons.
We also present the theoretical analysis of SetRank to show that the bound of excess risk can be proportional to $sqrtM/N$.
arXiv Detail & Related papers (2020-02-23T06:40:48Z) - Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback.
We devise an algorithm with a minimal cluster recovery error rate.
For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.