Related papers: Contextual User Browsing Bandits for Large-Scale Online Mobile Recommendation

Contextual User Browsing Bandits for Large-Scale Online Mobile Recommendation

URL: http://arxiv.org/abs/2008.09368v1
Date: Fri, 21 Aug 2020 08:22:30 GMT
Title: Contextual User Browsing Bandits for Large-Scale Online Mobile Recommendation
Authors: Xu He, Bo An, Yanghua Li, Haikai Chen, Qingyu Guo, Xin Li, and Zhirong Wang
Abstract summary: Higher positions lead to more clicks for one commodity. Only a few recommended items are shown at first glance and users need to slide the screen to browse other items. Some recommended items ranked behind are not viewed by users and it is not proper to treat this kind of items as negative samples.
Score: 24.810164687987243
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Online recommendation services recommend multiple commodities to users. Nowadays, a considerable proportion of users visit e-commerce platforms by mobile devices. Due to the limited screen size of mobile devices, positions of items have a significant influence on clicks: 1) Higher positions lead to more clicks for one commodity. 2) The 'pseudo-exposure' issue: Only a few recommended items are shown at first glance and users need to slide the screen to browse other items. Therefore, some recommended items ranked behind are not viewed by users and it is not proper to treat this kind of items as negative samples. While many works model the online recommendation as contextual bandit problems, they rarely take the influence of positions into consideration and thus the estimation of the reward function may be biased. In this paper, we aim at addressing these two issues to improve the performance of online mobile recommendation. Our contributions are four-fold. First, since we concern the reward of a set of recommended items, we model the online recommendation as a contextual combinatorial bandit problem and define the reward of a recommended set. Second, we propose a novel contextual combinatorial bandit method called UBM-LinUCB to address two issues related to positions by adopting the User Browsing Model (UBM), a click model for web search. Third, we provide a formal regret analysis and prove that our algorithm achieves sublinear regret independent of the number of items. Finally, we evaluate our algorithm on two real-world datasets by a novel unbiased estimator. An online experiment is also implemented in Taobao, one of the most popular e-commerce platforms in the world. Results on two CTR metrics show that our algorithm outperforms the other contextual bandit algorithms.

Related papers

Online Clustering of Dueling Bandits [59.09590979404303]
We introduce the first "clustering of dueling bandit algorithms" to enable collaborative decision-making based on preference feedback. We propose two novel algorithms: (1) Clustering of Linear Dueling Bandits (COLDB) which models the user reward functions as linear functions of the context vectors, and (2) Clustering of Neural Dueling Bandits (CONDB) which uses a neural network to model complex, non-linear user reward functions.
arXiv Detail & Related papers (2025-02-04T07:55:41Z)
Mitigating Exposure Bias in Online Learning to Rank Recommendation: A Novel Reward Model for Cascading Bandits [23.15042648884445]
We study exposure bias in a class of well-known contextual bandit algorithms known as Linear Cascading Bandits. We propose an Exposure-Aware reward model that updates the model parameters based on two factors: 1) implicit user feedback and 2) the position of the item in the recommendation list.
arXiv Detail & Related papers (2024-08-08T09:35:01Z)
A Scalable Recommendation Engine for New Users and Items [0.0]
Collaborative Filtering (CF) Multi-armed Bandit (B) with Attributes (A) recommendation system (CFB-A) This paper introduces a Collaborative Filtering (CF) Multi-armed Bandit (B) with Attributes (A) recommendation system (CFB-A) to jointly accommodate all of these considerations. Empirical applications including an offline test on MovieLens data, synthetic data simulations, and an online grocery experiment indicate the CFB-A leads to substantial improvement on cumulative average rewards.
arXiv Detail & Related papers (2022-09-06T14:59:00Z)
Recommendation Systems with Distribution-Free Reliability Guarantees [83.80644194980042]
We show how to return a set of items rigorously guaranteed to contain mostly good items. Our procedure endows any ranking model with rigorous finite-sample control of the false discovery rate. We evaluate our methods on the Yahoo! Learning to Rank and MSMarco datasets.
arXiv Detail & Related papers (2022-07-04T17:49:25Z)
On component interactions in two-stage recommender systems [82.38014314502861]
Two-stage recommenders are used by many online platforms, including YouTube, LinkedIn, and Pinterest. We show that interactions between the ranker and the nominators substantially affect the overall performance. In particular, using a Mixture-of-Experts approach, we train the nominators to specialize on different subsets of the item pool.
arXiv Detail & Related papers (2021-06-28T20:53:23Z)
Set2setRank: Collaborative Set to Set Ranking for Implicit Feedback based Recommendation [59.183016033308014]
In this paper, we explore the unique characteristics of the implicit feedback and propose Set2setRank framework for recommendation. Our proposed framework is model-agnostic and can be easily applied to most recommendation prediction approaches.
arXiv Detail & Related papers (2021-05-16T08:06:22Z)
Dynamic Slate Recommendation with Gated Recurrent Units and Thompson Sampling [6.312395952874578]
We consider the problem of recommending relevant content to users of an internet platform in the form of lists of items, called slates. We introduce a variational Bayesian Recurrent Neural Net recommender system that acts on time series of interactions between the internet platform and the user. We show experimentally that explorative recommender strategies perform on par or above their greedy counterparts.
arXiv Detail & Related papers (2021-04-30T15:16:35Z)
Measuring Recommender System Effects with Simulated Users [19.09065424910035]
Popularity bias and filter bubbles are two of the most well-studied recommender system biases. We offer a simulation framework for measuring the impact of a recommender system under different types of user behavior.
arXiv Detail & Related papers (2021-01-12T14:51:11Z)
Regret in Online Recommendation Systems [73.58127515175127]
This paper proposes a theoretical analysis of recommendation systems in an online setting, where items are sequentially recommended to users over time. In each round, a user, randomly picked from a population of $m$ users, requests a recommendation. The decision-maker observes the user and selects an item from a catalogue of $n$ items. The performance of the recommendation algorithm is captured through its regret, considering as a reference an Oracle algorithm aware of these probabilities.
arXiv Detail & Related papers (2020-10-23T12:48:35Z)
Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback [62.997667081978825]
We present a novel approach for considering user feedback and evaluate it using three distinct strategies. Despite a limited number of feedbacks returned by users (as low as 20% of the total), our approach obtains similar results to those of state of the art approaches.
arXiv Detail & Related papers (2020-09-16T07:32:51Z)
Controllable Multi-Interest Framework for Recommendation [64.30030600415654]
We formalize the recommender system as a sequential recommendation problem. We propose a novel controllable multi-interest framework for the sequential recommendation, called ComiRec. Our framework has been successfully deployed on the offline Alibaba distributed cloud platform.
arXiv Detail & Related papers (2020-05-19T10:18:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.