Related papers: Learning to Rank under Multinomial Logit Choice

Learning to Rank under Multinomial Logit Choice

URL: http://arxiv.org/abs/2009.03207v2
Date: Thu, 11 May 2023 10:39:42 GMT
Title: Learning to Rank under Multinomial Logit Choice
Authors: James A. Grant, David S. Leslie
Abstract summary: Learning the optimal ordering of content is an important challenge in website design. We present theoretical analysis leading to an $Omega(sqrtJT)$ lower bound for the problem, and an $tildeO(sqrtJT)$ upper bound on regret of the UCB algorithm.
Score: 6.929312022493406
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning the optimal ordering of content is an important challenge in website design. The learning to rank (LTR) framework models this problem as a sequential problem of selecting lists of content and observing where users decide to click. Most previous work on LTR assumes that the user considers each item in the list in isolation, and makes binary choices to click or not on each. We introduce a multinomial logit (MNL) choice model to the LTR framework, which captures the behaviour of users who consider the ordered list of items as a whole and make a single choice among all the items and a no-click option. Under the MNL model, the user favours items which are either inherently more attractive, or placed in a preferable position within the list. We propose upper confidence bound (UCB) algorithms to minimise regret in two settings - where the position dependent parameters are known, and unknown. We present theoretical analysis leading to an $\Omega(\sqrt{JT})$ lower bound for the problem, an $\tilde{O}(\sqrt{JT})$ upper bound on regret of the UCB algorithm in the known-parameter setting, and an $\tilde{O}(K^2\sqrt{JT})$ upper bound on regret, the first, in the more challenging unknown-position-parameter setting. Our analyses are based on tight new concentration results for Geometric random variables, and novel functional inequalities for maximum likelihood estimators computed on discrete data.

Related papers

Regret Minimization for Piecewise Linear Rewards: Contracts, Auctions, and Beyond [37.92189925462977]
This paper introduces a general online learning framework to tackle regret minimization for piecewise linear rewards. We show that our algorithm attains a regret of $widetildeO(sqrtnT)$, where $n$ is the number of pieces of the reward function and $T$ is the number of rounds. Second, our algorithm demonstrates that, in the problem of learning to set prices in postedprice auctions, it is possible to attain suitable (and desirable) instance-independent regret bounds.
arXiv Detail & Related papers (2025-03-03T16:13:45Z)
Adaptively Learning to Select-Rank in Online Platforms [34.258659206323664]
This research addresses the challenge of adaptively ranking items from a candidate pool for heterogeneous users. We develop a user response model that considers diverse user preferences and the varying effects of item positions. Experiments conducted on both simulated and real-world datasets demonstrate our algorithm outperforms the baseline.
arXiv Detail & Related papers (2024-06-07T15:33:48Z)
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel. An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z)
Learning to Select and Rank from Choice-Based Feedback: A Simple Nested Approach [10.293894471295205]
We study a ranking and selection problem of learning from choice-based feedback with dynamic assortments. We present novel and simple algorithms for both learning goals.
arXiv Detail & Related papers (2023-07-13T05:05:30Z)
Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model [9.945948163150874]
This work studies the online learning to rank problem in both and adversarial environments under the position-based model. We prove the proposed algorithm simultaneously achieves $O(logT)$ regret in the adversarial environment and $O(msqrtnT)$ regret in the adversarial environment. Experiments show that our algorithm could simultaneously learn in both and adversarial environments and is competitive compared to existing methods.
arXiv Detail & Related papers (2022-07-12T10:00:14Z)
Online Learning of Optimally Diverse Rankings [63.62764375279861]
We propose an algorithm that efficiently learns the optimal list based on users' feedback only. We show that after $T$ queries, the regret of LDR scales as $O((N-L)log(T))$ where $N$ is the number of all items.
arXiv Detail & Related papers (2021-09-13T12:13:20Z)
Multinomial Logit Contextual Bandits: Provable Optimality and Practicality [15.533842336139063]
We consider a sequential assortment selection problem where the user choice is given by a multinomial logit (MNL) choice model whose parameters are unknown. We propose upper confidence bound based algorithms for this MNL contextual bandit. We show that a simple variant of the algorithm achieves the optimal regret for a broad class of important applications.
arXiv Detail & Related papers (2021-03-25T15:42:25Z)
Learning over no-Preferred and Preferred Sequence of items for Robust Recommendation [66.8722561224499]
We propose a theoretically founded sequential strategy for training large-scale Recommender Systems (RS) over implicit feedback. We present two variants of this strategy where model parameters are updated using either the momentum method or a gradient-based approach.
arXiv Detail & Related papers (2020-12-12T22:10:15Z)
Fully Gap-Dependent Bounds for Multinomial Logit Bandit [5.132017939561661]
We study the multinomial logit (MNL) bandit problem, where at each time step, the seller offers an assortment of size at most $K$ from a pool of $N$ items. We present (i) an algorithm that identifies the optimal assortment $S*$ within $widetildeO(sum_i = 1N Delta_i-2)$ time steps with high probability, and (ii) an algorithm that incurs $O(sum_i notin S* KDelta_i
arXiv Detail & Related papers (2020-11-19T17:52:12Z)
Efficient Online Learning of Optimal Rankings: Dimensionality Reduction via Gradient Descent [47.66497212729108]
Generalized Min-Sum-Set-Cover problem serves as a formal model for the setting above. We show how to achieve low regret for GMSSC in-time.
arXiv Detail & Related papers (2020-11-05T13:52:34Z)
SetRank: A Setwise Bayesian Approach for Collaborative Ranking from Implicit Feedback [50.13745601531148]
We propose a novel setwise Bayesian approach for collaborative ranking, namely SetRank, to accommodate the characteristics of implicit feedback in recommender system. Specifically, SetRank aims at maximizing the posterior probability of novel setwise preference comparisons. We also present the theoretical analysis of SetRank to show that the bound of excess risk can be proportional to $sqrtM/N$.
arXiv Detail & Related papers (2020-02-23T06:40:48Z)
Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback. We devise an algorithm with a minimal cluster recovery error rate. For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.