Related papers: Learning to Rank in the Position Based Model with Bandit Feedback

Learning to Rank in the Position Based Model with Bandit Feedback

URL: http://arxiv.org/abs/2004.13106v1
Date: Mon, 27 Apr 2020 19:12:20 GMT
Title: Learning to Rank in the Position Based Model with Bandit Feedback
Authors: Beyza Ermis, Patrick Ernst, Yannik Stein, Giovanni Zappella
Abstract summary: We propose novel extensions of two well-known algorithms viz. LinUCB and Linear Thompson Sampling to the ranking use-case. To account for the biases in a production environment, we employ the position-based click model.
Score: 3.9121134770873742
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Personalization is a crucial aspect of many online experiences. In particular, content ranking is often a key component in delivering sophisticated personalization results. Commonly, supervised learning-to-rank methods are applied, which suffer from bias introduced during data collection by production systems in charge of producing the ranking. To compensate for this problem, we leverage contextual multi-armed bandits. We propose novel extensions of two well-known algorithms viz. LinUCB and Linear Thompson Sampling to the ranking use-case. To account for the biases in a production environment, we employ the position-based click model. Finally, we show the validity of the proposed algorithms by conducting extensive offline experiments on synthetic datasets as well as customer facing online A/B experiments.

Related papers

Simulating Biases for Interpretable Fairness in Offline and Online Classifiers [0.35998666903987897]
Mitigation methods are critical to ensure that model outcomes are adjusted to be fair.<n>We develop a framework for synthetic dataset generation with controllable bias injection.<n>In experiments, both offline and online learning approaches are employed.
arXiv Detail & Related papers (2025-07-14T11:04:24Z)
Epistemic Uncertainty-aware Recommendation Systems via Bayesian Deep Ensemble Learning [2.3310092106321365]
We propose an ensemble-based supermodel to generate more robust and reliable predictions. We also introduce a new interpretable non-linear matching approach for the user and item embeddings.
arXiv Detail & Related papers (2025-04-14T23:04:35Z)
Variational Bayesian Personalized Ranking [39.24591060825056]
Variational BPR is a novel and easily implementable learning objective that integrates likelihood optimization, noise reduction, and popularity debiasing. We introduce an attention-based latent interest prototype contrastive mechanism, replacing instance-level contrastive learning, to effectively reduce noise from problematic samples. Empirically, we demonstrate the effectiveness of Variational BPR on popular backbone recommendation models.
arXiv Detail & Related papers (2025-03-14T04:22:01Z)
Contextual Dual Learning Algorithm with Listwise Distillation for Unbiased Learning to Rank [26.69630281310365]
Unbiased Learning to Rank (ULTR) aims to leverage biased implicit user feedback (e.g., click) to optimize an unbiased ranking model. We propose a Contextual Dual Learning Algorithm with Listwise Distillation (CDLA-LD) to address both position bias and contextual bias.
arXiv Detail & Related papers (2024-08-19T09:13:52Z)
Diversified Batch Selection for Training Acceleration [68.67164304377732]
A prevalent research line, known as online batch selection, explores selecting informative subsets during the training process. vanilla reference-model-free methods involve independently scoring and selecting data in a sample-wise manner. We propose Diversified Batch Selection (DivBS), which is reference-model-free and can efficiently select diverse and representative samples.
arXiv Detail & Related papers (2024-06-07T12:12:20Z)
One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets. We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z)
Sample Complexity of Preference-Based Nonparametric Off-Policy Evaluation with Deep Networks [58.469818546042696]
We study the sample efficiency of OPE with human preference and establish a statistical guarantee for it. By appropriately selecting the size of a ReLU network, we show that one can leverage any low-dimensional manifold structure in the Markov decision process.
arXiv Detail & Related papers (2023-10-16T16:27:06Z)
Unbiased Learning to Rank with Biased Continuous Feedback [5.561943356123711]
Unbiased learning-to-rank(LTR) algorithms are verified to model the relative relevance accurately based on noisy feedback. To provide personalized high-quality recommendation results, recommender systems need model both categorical and continuous biased feedback. We introduce the pairwise trust bias to separate the position bias, trust bias, and user relevance explicitly. Experiment results on public benchmark datasets and internal live traffic of a large-scale recommender system at Tencent News show superior results for continuous labels.
arXiv Detail & Related papers (2023-03-08T02:14:08Z)
Boosting the Learning for Ranking Patterns [6.142272540492935]
This paper formulates the problem of learning pattern ranking functions as a multi-criteria decision making problem. Our approach aggregates different interestingness measures into a single weighted linear ranking function, using an interactive learning procedure. Experiments conducted on well-known datasets show that our approach significantly reduces the running time and returns precise pattern ranking.
arXiv Detail & Related papers (2022-03-05T10:22:44Z)
Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings. We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data. We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
Summary-Source Proposition-level Alignment: Task, Datasets and Supervised Baseline [94.0601799665342]
Aligning sentences in a reference summary with their counterparts in source documents was shown as a useful auxiliary summarization task. We propose establishing summary-source alignment as an explicit task, while introducing two major novelties. We create a novel training dataset for proposition-level alignment, derived automatically from available summarization evaluation data. We present a supervised proposition alignment baseline model, showing improved alignment-quality over the unsupervised approach.
arXiv Detail & Related papers (2020-09-01T17:27:12Z)
Deep Bayesian Bandits: Exploring in Online Personalized Recommendations [4.845576821204241]
We formulate a display advertising recommender as a contextual bandit. We implement exploration techniques that require sampling from the posterior distribution of click-through-rates. We test our proposed deep Bayesian bandits algorithm in the offline simulation and online AB setting.
arXiv Detail & Related papers (2020-08-03T08:58:18Z)
Fairness-Aware Online Personalization [16.320648868892526]
We present a study of fairness in online personalization settings involving the ranking of individuals. We first demonstrate that online personalization can cause the model to learn to act in an unfair manner if the user is biased in his/her responses. We then formulate the problem of learning personalized models under fairness constraints and present a regularization based approach for mitigating biases in machine learning.
arXiv Detail & Related papers (2020-07-30T07:16:17Z)
Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data. There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups. We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.