Unbiased Learning to Rank: Online or Offline?
- URL: http://arxiv.org/abs/2004.13574v3
- Date: Wed, 2 Dec 2020 16:55:23 GMT
- Title: Unbiased Learning to Rank: Online or Offline?
- Authors: Qingyao Ai, Tao Yang, Huazheng Wang, Jiaxin Mao
- Abstract summary: How to obtain an unbiased ranking model by learning to rank with biased user feedback is an important research question for IR.
Existing work on unbiased learning to rank can be broadly categorized into two groups -- the studies on unbiased learning algorithms with logged data, and the studies on unbiased parameters estimation with real-time user interactions.
This paper formalizes the task of unbiased learning to rank and shows that existing algorithms for offline unbiased learning and online learning to rank are just the two sides of the same coin.
- Score: 28.431648823968278
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How to obtain an unbiased ranking model by learning to rank with biased user
feedback is an important research question for IR. Existing work on unbiased
learning to rank (ULTR) can be broadly categorized into two groups -- the
studies on unbiased learning algorithms with logged data, namely the
\textit{offline} unbiased learning, and the studies on unbiased parameters
estimation with real-time user interactions, namely the \textit{online}
learning to rank. While their definitions of \textit{unbiasness} are different,
these two types of ULTR algorithms share the same goal -- to find the best
models that rank documents based on their intrinsic relevance or utility.
However, most studies on offline and online unbiased learning to rank are
carried in parallel without detailed comparisons on their background theories
and empirical performance. In this paper, we formalize the task of unbiased
learning to rank and show that existing algorithms for offline unbiased
learning and online learning to rank are just the two sides of the same coin.
We evaluate six state-of-the-art ULTR algorithms and find that most of them can
be used in both offline settings and online environments with or without minor
modifications. Further, we analyze how different offline and online learning
paradigms would affect the theoretical foundation and empirical effectiveness
of each algorithm on both synthetic and real search data. Our findings could
provide important insights and guideline for choosing and deploying ULTR
algorithms in practice.
Related papers
- Contextual Dual Learning Algorithm with Listwise Distillation for Unbiased Learning to Rank [26.69630281310365]
Unbiased Learning to Rank (ULTR) aims to leverage biased implicit user feedback (e.g., click) to optimize an unbiased ranking model.
We propose a Contextual Dual Learning Algorithm with Listwise Distillation (CDLA-LD) to address both position bias and contextual bias.
arXiv Detail & Related papers (2024-08-19T09:13:52Z) - Understanding the performance gap between online and offline alignment algorithms [63.137832242488926]
We show that offline algorithms train policy to become good at pairwise classification, while online algorithms are good at generations.
This hints at a unique interplay between discriminative and generative capabilities, which is greatly impacted by the sampling process.
Our study sheds light on the pivotal role of on-policy sampling in AI alignment, and hints at certain fundamental challenges of offline alignment algorithms.
arXiv Detail & Related papers (2024-05-14T09:12:30Z) - Whole Page Unbiased Learning to Rank [59.52040055543542]
Unbiased Learning to Rank(ULTR) algorithms are proposed to learn an unbiased ranking model with biased click data.
We propose a Bias Agnostic whole-page unbiased Learning to rank algorithm, named BAL, to automatically find the user behavior model.
Experimental results on a real-world dataset verify the effectiveness of the BAL.
arXiv Detail & Related papers (2022-10-19T16:53:08Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - A Large Scale Search Dataset for Unbiased Learning to Rank [51.97967284268577]
We introduce the Baidu-ULTR dataset for unbiased learning to rank.
It involves randomly sampled 1.2 billion searching sessions and 7,008 expert annotated queries.
It provides: (1) the original semantic feature and a pre-trained language model for easy usage; (2) sufficient display information such as position, displayed height, and displayed abstract; and (3) rich user feedback on search result pages (SERPs) like dwelling time.
arXiv Detail & Related papers (2022-07-07T02:37:25Z) - Smoothed Online Learning is as Easy as Statistical Learning [77.00766067963195]
We provide the first oracle-efficient, no-regret algorithms in this setting.
We show that if a function class is learnable in the classical setting, then there is an oracle-efficient, no-regret algorithm for contextual bandits.
arXiv Detail & Related papers (2022-02-09T19:22:34Z) - Deep Policies for Online Bipartite Matching: A Reinforcement Learning
Approach [5.683591363967851]
We present an end-to-end Reinforcement Learning framework for deriving better matching policies based on trial-and-error on historical data.
We show that most of the learning approaches perform significantly better than classical greedy algorithms on four synthetic and real-world datasets.
arXiv Detail & Related papers (2021-09-21T18:04:19Z) - ULTRA: An Unbiased Learning To Rank Algorithm Toolbox [13.296248894004652]
In this paper, we describe the general framework of unbiased learning to rank (ULTR)
We also briefly describe the algorithms in ULTRA, detailed the structure, and pipeline of the toolbox.
Our toolbox is an important resource for researchers to conduct experiments on ULTR algorithms with different configurations as well as testing their own algorithms with the supported features.
arXiv Detail & Related papers (2021-08-11T07:26:59Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z) - PairRank: Online Pairwise Learning to Rank by Divide-and-Conquer [35.199462901346706]
We propose to estimate a pairwise learning to rank model online.
In each round, candidate documents are partitioned and ranked according to the model's confidence on the estimated pairwise rank order.
Regret directly defined on the number of mis-ordered pairs is proven, which connects the online solution's theoretical convergence with its expected ranking performance.
arXiv Detail & Related papers (2021-02-28T01:16:55Z) - Controlling Fairness and Bias in Dynamic Learning-to-Rank [31.41843594914603]
We propose a learning algorithm that ensures notions of amortized group fairness, while simultaneously learning the ranking function from implicit feedback data.
The algorithm takes the form of a controller that integrates unbiased estimators for both fairness and utility.
In addition to its rigorous theoretical foundation and convergence guarantees, we find empirically that the algorithm is highly practical and robust.
arXiv Detail & Related papers (2020-05-29T17:57:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.