Contextual User Browsing Bandits for Large-Scale Online Mobile
Recommendation
- URL: http://arxiv.org/abs/2008.09368v1
- Date: Fri, 21 Aug 2020 08:22:30 GMT
- Title: Contextual User Browsing Bandits for Large-Scale Online Mobile
Recommendation
- Authors: Xu He, Bo An, Yanghua Li, Haikai Chen, Qingyu Guo, Xin Li, and Zhirong
Wang
- Abstract summary: Higher positions lead to more clicks for one commodity.
Only a few recommended items are shown at first glance and users need to slide the screen to browse other items.
Some recommended items ranked behind are not viewed by users and it is not proper to treat this kind of items as negative samples.
- Score: 24.810164687987243
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online recommendation services recommend multiple commodities to users.
Nowadays, a considerable proportion of users visit e-commerce platforms by
mobile devices. Due to the limited screen size of mobile devices, positions of
items have a significant influence on clicks: 1) Higher positions lead to more
clicks for one commodity. 2) The 'pseudo-exposure' issue: Only a few
recommended items are shown at first glance and users need to slide the screen
to browse other items. Therefore, some recommended items ranked behind are not
viewed by users and it is not proper to treat this kind of items as negative
samples. While many works model the online recommendation as contextual bandit
problems, they rarely take the influence of positions into consideration and
thus the estimation of the reward function may be biased. In this paper, we aim
at addressing these two issues to improve the performance of online mobile
recommendation. Our contributions are four-fold. First, since we concern the
reward of a set of recommended items, we model the online recommendation as a
contextual combinatorial bandit problem and define the reward of a recommended
set. Second, we propose a novel contextual combinatorial bandit method called
UBM-LinUCB to address two issues related to positions by adopting the User
Browsing Model (UBM), a click model for web search. Third, we provide a formal
regret analysis and prove that our algorithm achieves sublinear regret
independent of the number of items. Finally, we evaluate our algorithm on two
real-world datasets by a novel unbiased estimator. An online experiment is also
implemented in Taobao, one of the most popular e-commerce platforms in the
world. Results on two CTR metrics show that our algorithm outperforms the other
contextual bandit algorithms.
Related papers
- Mitigating Exposure Bias in Online Learning to Rank Recommendation: A Novel Reward Model for Cascading Bandits [23.15042648884445]
We study exposure bias in a class of well-known contextual bandit algorithms known as Linear Cascading Bandits.
We propose an Exposure-Aware reward model that updates the model parameters based on two factors: 1) implicit user feedback and 2) the position of the item in the recommendation list.
arXiv Detail & Related papers (2024-08-08T09:35:01Z) - A Scalable Recommendation Engine for New Users and Items [0.0]
Collaborative Filtering (CF) Multi-armed Bandit (B) with Attributes (A) recommendation system (CFB-A)
This paper introduces a Collaborative Filtering (CF) Multi-armed Bandit (B) with Attributes (A) recommendation system (CFB-A) to jointly accommodate all of these considerations.
Empirical applications including an offline test on MovieLens data, synthetic data simulations, and an online grocery experiment indicate the CFB-A leads to substantial improvement on cumulative average rewards.
arXiv Detail & Related papers (2022-09-06T14:59:00Z) - Recommendation Systems with Distribution-Free Reliability Guarantees [83.80644194980042]
We show how to return a set of items rigorously guaranteed to contain mostly good items.
Our procedure endows any ranking model with rigorous finite-sample control of the false discovery rate.
We evaluate our methods on the Yahoo! Learning to Rank and MSMarco datasets.
arXiv Detail & Related papers (2022-07-04T17:49:25Z) - On component interactions in two-stage recommender systems [82.38014314502861]
Two-stage recommenders are used by many online platforms, including YouTube, LinkedIn, and Pinterest.
We show that interactions between the ranker and the nominators substantially affect the overall performance.
In particular, using a Mixture-of-Experts approach, we train the nominators to specialize on different subsets of the item pool.
arXiv Detail & Related papers (2021-06-28T20:53:23Z) - Set2setRank: Collaborative Set to Set Ranking for Implicit Feedback
based Recommendation [59.183016033308014]
In this paper, we explore the unique characteristics of the implicit feedback and propose Set2setRank framework for recommendation.
Our proposed framework is model-agnostic and can be easily applied to most recommendation prediction approaches.
arXiv Detail & Related papers (2021-05-16T08:06:22Z) - Dynamic Slate Recommendation with Gated Recurrent Units and Thompson
Sampling [6.312395952874578]
We consider the problem of recommending relevant content to users of an internet platform in the form of lists of items, called slates.
We introduce a variational Bayesian Recurrent Neural Net recommender system that acts on time series of interactions between the internet platform and the user.
We show experimentally that explorative recommender strategies perform on par or above their greedy counterparts.
arXiv Detail & Related papers (2021-04-30T15:16:35Z) - Measuring Recommender System Effects with Simulated Users [19.09065424910035]
Popularity bias and filter bubbles are two of the most well-studied recommender system biases.
We offer a simulation framework for measuring the impact of a recommender system under different types of user behavior.
arXiv Detail & Related papers (2021-01-12T14:51:11Z) - Regret in Online Recommendation Systems [73.58127515175127]
This paper proposes a theoretical analysis of recommendation systems in an online setting, where items are sequentially recommended to users over time.
In each round, a user, randomly picked from a population of $m$ users, requests a recommendation. The decision-maker observes the user and selects an item from a catalogue of $n$ items.
The performance of the recommendation algorithm is captured through its regret, considering as a reference an Oracle algorithm aware of these probabilities.
arXiv Detail & Related papers (2020-10-23T12:48:35Z) - Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users'
Feedback [62.997667081978825]
We present a novel approach for considering user feedback and evaluate it using three distinct strategies.
Despite a limited number of feedbacks returned by users (as low as 20% of the total), our approach obtains similar results to those of state of the art approaches.
arXiv Detail & Related papers (2020-09-16T07:32:51Z) - Controllable Multi-Interest Framework for Recommendation [64.30030600415654]
We formalize the recommender system as a sequential recommendation problem.
We propose a novel controllable multi-interest framework for the sequential recommendation, called ComiRec.
Our framework has been successfully deployed on the offline Alibaba distributed cloud platform.
arXiv Detail & Related papers (2020-05-19T10:18:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.