Related papers: Taking the Counterfactual Online: Efficient and Unbiased Online Evaluation for Ranking

Taking the Counterfactual Online: Efficient and Unbiased Online Evaluation for Ranking

URL: http://arxiv.org/abs/2007.12719v2
Date: Tue, 28 Jul 2020 17:59:33 GMT
Title: Taking the Counterfactual Online: Efficient and Unbiased Online Evaluation for Ranking
Authors: Harrie Oosterhuis and Maarten de Rijke
Abstract summary: We introduce the novel Logging-Policy Optimization Algorithm (LogOpt), which optimize the policy for logging data. LogOpt turns the counterfactual approach - which is indifferent to the logging policy - into an online approach, where the algorithm decides what rankings to display. We prove that, as an online evaluation method, LogOpt is unbiased w.r.t. position and item-selection bias, unlike existing interleaving methods.
Score: 74.46448041224247
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Counterfactual evaluation can estimate Click-Through-Rate (CTR) differences between ranking systems based on historical interaction data, while mitigating the effect of position bias and item-selection bias. We introduce the novel Logging-Policy Optimization Algorithm (LogOpt), which optimizes the policy for logging data so that the counterfactual estimate has minimal variance. As minimizing variance leads to faster convergence, LogOpt increases the data-efficiency of counterfactual estimation. LogOpt turns the counterfactual approach - which is indifferent to the logging policy - into an online approach, where the algorithm decides what rankings to display. We prove that, as an online evaluation method, LogOpt is unbiased w.r.t. position and item-selection bias, unlike existing interleaving methods. Furthermore, we perform large-scale experiments by simulating comparisons between thousands of rankers. Our results show that while interleaving methods make systematic errors, LogOpt is as efficient as interleaving without being biased.

Related papers

Correcting for Position Bias in Learning to Rank: A Control Function Approach [9.986244291715762]
We propose a novel control function-based method that accounts for position bias in a two-stage process.<n>Unlike previous position bias correction methods, our method does not require knowledge of the click or propensity model.<n> Experimental results demonstrate that our method outperforms state-of-the-art approaches in correcting for position bias.
arXiv Detail & Related papers (2025-06-08T04:10:14Z)
Exploring the Performance of Continuous-Time Dynamic Link Prediction Algorithms [14.82820088479196]
Dynamic Link Prediction (DLP) addresses the prediction of future links in evolving networks. In this work, we contribute tools to perform such a comprehensive evaluation. We describe an exhaustive taxonomy of negative sampling methods that can be used at evaluation time.
arXiv Detail & Related papers (2024-05-27T14:03:28Z)
Understanding the performance gap between online and offline alignment algorithms [63.137832242488926]
We show that offline algorithms train policy to become good at pairwise classification, while online algorithms are good at generations. This hints at a unique interplay between discriminative and generative capabilities, which is greatly impacted by the sampling process. Our study sheds light on the pivotal role of on-policy sampling in AI alignment, and hints at certain fundamental challenges of offline alignment algorithms.
arXiv Detail & Related papers (2024-05-14T09:12:30Z)
Bipartite Ranking Fairness through a Model Agnostic Ordering Adjustment [54.179859639868646]
We propose a model agnostic post-processing framework xOrder for achieving fairness in bipartite ranking. xOrder is compatible with various classification models and ranking fairness metrics, including supervised and unsupervised fairness metrics. We evaluate our proposed algorithm on four benchmark data sets and two real-world patient electronic health record repositories.
arXiv Detail & Related papers (2023-07-27T07:42:44Z)
Pessimistic Off-Policy Optimization for Learning to Rank [13.733459243449634]
Off-policy learning is a framework for optimizing policies without deploying them. In recommender systems, this is especially challenging due to the imbalance in logged data. We study pessimistic off-policy optimization for learning to rank.
arXiv Detail & Related papers (2022-06-06T12:58:28Z)
Variance-Optimal Augmentation Logging for Counterfactual Evaluation in Contextual Bandits [25.153656462604268]
Methods for offline A/B testing and counterfactual learning are seeing rapid adoption in search and recommender systems. The counterfactual estimators that are commonly used in these methods can have large bias and large variance when the logging policy is very different from the target policy being evaluated. This paper introduces Minimum Variance Augmentation Logging (MVAL), a method for constructing logging policies that minimize the variance of the downstream evaluation or learning problem.
arXiv Detail & Related papers (2022-02-03T17:37:11Z)
Unbiased Pairwise Learning to Rank in Recommender Systems [4.058828240864671]
Unbiased learning to rank algorithms are appealing candidates and have already been applied in many applications with single categorical labels. We propose a novel unbiased LTR algorithm to tackle the challenges, which innovatively models position bias in the pairwise fashion. Experiment results on public benchmark datasets and internal live traffic show the superior results of the proposed method for both categorical and continuous labels.
arXiv Detail & Related papers (2021-11-25T06:04:59Z)
Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise Comparisons [85.5955376526419]
In rank aggregation problems, users exhibit various accuracy levels when comparing pairs of items. We propose an elimination-based active sampling strategy, which estimates the ranking of items via noisy pairwise comparisons. We prove that our algorithm can return the true ranking of items with high probability.
arXiv Detail & Related papers (2021-10-08T13:51:55Z)
PiRank: Learning To Rank via Differentiable Sorting [85.28916333414145]
We propose PiRank, a new class of differentiable surrogates for ranking. We show that PiRank exactly recovers the desired metrics in the limit of zero temperature.
arXiv Detail & Related papers (2020-12-12T05:07:36Z)
Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data. There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups. We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.