Learning to Rank under Multinomial Logit Choice
- URL: http://arxiv.org/abs/2009.03207v2
- Date: Thu, 11 May 2023 10:39:42 GMT
- Title: Learning to Rank under Multinomial Logit Choice
- Authors: James A. Grant, David S. Leslie
- Abstract summary: Learning the optimal ordering of content is an important challenge in website design.
We present theoretical analysis leading to an $Omega(sqrtJT)$ lower bound for the problem, and an $tildeO(sqrtJT)$ upper bound on regret of the UCB algorithm.
- Score: 6.929312022493406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning the optimal ordering of content is an important challenge in website
design. The learning to rank (LTR) framework models this problem as a
sequential problem of selecting lists of content and observing where users
decide to click. Most previous work on LTR assumes that the user considers each
item in the list in isolation, and makes binary choices to click or not on
each. We introduce a multinomial logit (MNL) choice model to the LTR framework,
which captures the behaviour of users who consider the ordered list of items as
a whole and make a single choice among all the items and a no-click option.
Under the MNL model, the user favours items which are either inherently more
attractive, or placed in a preferable position within the list. We propose
upper confidence bound (UCB) algorithms to minimise regret in two settings -
where the position dependent parameters are known, and unknown. We present
theoretical analysis leading to an $\Omega(\sqrt{JT})$ lower bound for the
problem, an $\tilde{O}(\sqrt{JT})$ upper bound on regret of the UCB algorithm
in the known-parameter setting, and an $\tilde{O}(K^2\sqrt{JT})$ upper bound on
regret, the first, in the more challenging unknown-position-parameter setting.
Our analyses are based on tight new concentration results for Geometric random
variables, and novel functional inequalities for maximum likelihood estimators
computed on discrete data.
Related papers
- Adaptively Learning to Select-Rank in Online Platforms [34.258659206323664]
This research addresses the challenge of adaptively ranking items from a candidate pool for heterogeneous users.
We develop a user response model that considers diverse user preferences and the varying effects of item positions.
Experiments conducted on both simulated and real-world datasets demonstrate our algorithm outperforms the baseline.
arXiv Detail & Related papers (2024-06-07T15:33:48Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Simultaneously Learning Stochastic and Adversarial Bandits under the
Position-Based Model [9.945948163150874]
This work studies the online learning to rank problem in both and adversarial environments under the position-based model.
We prove the proposed algorithm simultaneously achieves $O(logT)$ regret in the adversarial environment and $O(msqrtnT)$ regret in the adversarial environment.
Experiments show that our algorithm could simultaneously learn in both and adversarial environments and is competitive compared to existing methods.
arXiv Detail & Related papers (2022-07-12T10:00:14Z) - Online Learning of Optimally Diverse Rankings [63.62764375279861]
We propose an algorithm that efficiently learns the optimal list based on users' feedback only.
We show that after $T$ queries, the regret of LDR scales as $O((N-L)log(T))$ where $N$ is the number of all items.
arXiv Detail & Related papers (2021-09-13T12:13:20Z) - Multinomial Logit Contextual Bandits: Provable Optimality and
Practicality [15.533842336139063]
We consider a sequential assortment selection problem where the user choice is given by a multinomial logit (MNL) choice model whose parameters are unknown.
We propose upper confidence bound based algorithms for this MNL contextual bandit.
We show that a simple variant of the algorithm achieves the optimal regret for a broad class of important applications.
arXiv Detail & Related papers (2021-03-25T15:42:25Z) - Learning over no-Preferred and Preferred Sequence of items for Robust
Recommendation [66.8722561224499]
We propose a theoretically founded sequential strategy for training large-scale Recommender Systems (RS) over implicit feedback.
We present two variants of this strategy where model parameters are updated using either the momentum method or a gradient-based approach.
arXiv Detail & Related papers (2020-12-12T22:10:15Z) - Fully Gap-Dependent Bounds for Multinomial Logit Bandit [5.132017939561661]
We study the multinomial logit (MNL) bandit problem, where at each time step, the seller offers an assortment of size at most $K$ from a pool of $N$ items.
We present (i) an algorithm that identifies the optimal assortment $S*$ within $widetildeO(sum_i = 1N Delta_i-2)$ time steps with high probability, and (ii) an algorithm that incurs $O(sum_i notin S* KDelta_i
arXiv Detail & Related papers (2020-11-19T17:52:12Z) - Efficient Online Learning of Optimal Rankings: Dimensionality Reduction
via Gradient Descent [47.66497212729108]
Generalized Min-Sum-Set-Cover problem serves as a formal model for the setting above.
We show how to achieve low regret for GMSSC in-time.
arXiv Detail & Related papers (2020-11-05T13:52:34Z) - SetRank: A Setwise Bayesian Approach for Collaborative Ranking from
Implicit Feedback [50.13745601531148]
We propose a novel setwise Bayesian approach for collaborative ranking, namely SetRank, to accommodate the characteristics of implicit feedback in recommender system.
Specifically, SetRank aims at maximizing the posterior probability of novel setwise preference comparisons.
We also present the theoretical analysis of SetRank to show that the bound of excess risk can be proportional to $sqrtM/N$.
arXiv Detail & Related papers (2020-02-23T06:40:48Z) - Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback.
We devise an algorithm with a minimal cluster recovery error rate.
For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.