Taking the Counterfactual Online: Efficient and Unbiased Online
Evaluation for Ranking
- URL: http://arxiv.org/abs/2007.12719v2
- Date: Tue, 28 Jul 2020 17:59:33 GMT
- Title: Taking the Counterfactual Online: Efficient and Unbiased Online
Evaluation for Ranking
- Authors: Harrie Oosterhuis and Maarten de Rijke
- Abstract summary: We introduce the novel Logging-Policy Optimization Algorithm (LogOpt), which optimize the policy for logging data.
LogOpt turns the counterfactual approach - which is indifferent to the logging policy - into an online approach, where the algorithm decides what rankings to display.
We prove that, as an online evaluation method, LogOpt is unbiased w.r.t. position and item-selection bias, unlike existing interleaving methods.
- Score: 74.46448041224247
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Counterfactual evaluation can estimate Click-Through-Rate (CTR) differences
between ranking systems based on historical interaction data, while mitigating
the effect of position bias and item-selection bias. We introduce the novel
Logging-Policy Optimization Algorithm (LogOpt), which optimizes the policy for
logging data so that the counterfactual estimate has minimal variance. As
minimizing variance leads to faster convergence, LogOpt increases the
data-efficiency of counterfactual estimation. LogOpt turns the counterfactual
approach - which is indifferent to the logging policy - into an online
approach, where the algorithm decides what rankings to display. We prove that,
as an online evaluation method, LogOpt is unbiased w.r.t. position and
item-selection bias, unlike existing interleaving methods. Furthermore, we
perform large-scale experiments by simulating comparisons between thousands of
rankers. Our results show that while interleaving methods make systematic
errors, LogOpt is as efficient as interleaving without being biased.
Related papers
- Exploring the Performance of Continuous-Time Dynamic Link Prediction Algorithms [14.82820088479196]
Dynamic Link Prediction (DLP) addresses the prediction of future links in evolving networks.
In this work, we contribute tools to perform such a comprehensive evaluation.
We describe an exhaustive taxonomy of negative sampling methods that can be used at evaluation time.
arXiv Detail & Related papers (2024-05-27T14:03:28Z) - Understanding the performance gap between online and offline alignment algorithms [63.137832242488926]
We show that offline algorithms train policy to become good at pairwise classification, while online algorithms are good at generations.
This hints at a unique interplay between discriminative and generative capabilities, which is greatly impacted by the sampling process.
Our study sheds light on the pivotal role of on-policy sampling in AI alignment, and hints at certain fundamental challenges of offline alignment algorithms.
arXiv Detail & Related papers (2024-05-14T09:12:30Z) - Bipartite Ranking Fairness through a Model Agnostic Ordering Adjustment [54.179859639868646]
We propose a model agnostic post-processing framework xOrder for achieving fairness in bipartite ranking.
xOrder is compatible with various classification models and ranking fairness metrics, including supervised and unsupervised fairness metrics.
We evaluate our proposed algorithm on four benchmark data sets and two real-world patient electronic health record repositories.
arXiv Detail & Related papers (2023-07-27T07:42:44Z) - Pessimistic Off-Policy Optimization for Learning to Rank [13.733459243449634]
Off-policy learning is a framework for optimizing policies without deploying them.
In recommender systems, this is especially challenging due to the imbalance in logged data.
We study pessimistic off-policy optimization for learning to rank.
arXiv Detail & Related papers (2022-06-06T12:58:28Z) - Variance-Optimal Augmentation Logging for Counterfactual Evaluation in
Contextual Bandits [25.153656462604268]
Methods for offline A/B testing and counterfactual learning are seeing rapid adoption in search and recommender systems.
The counterfactual estimators that are commonly used in these methods can have large bias and large variance when the logging policy is very different from the target policy being evaluated.
This paper introduces Minimum Variance Augmentation Logging (MVAL), a method for constructing logging policies that minimize the variance of the downstream evaluation or learning problem.
arXiv Detail & Related papers (2022-02-03T17:37:11Z) - Unbiased Pairwise Learning to Rank in Recommender Systems [4.058828240864671]
Unbiased learning to rank algorithms are appealing candidates and have already been applied in many applications with single categorical labels.
We propose a novel unbiased LTR algorithm to tackle the challenges, which innovatively models position bias in the pairwise fashion.
Experiment results on public benchmark datasets and internal live traffic show the superior results of the proposed method for both categorical and continuous labels.
arXiv Detail & Related papers (2021-11-25T06:04:59Z) - Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise
Comparisons [85.5955376526419]
In rank aggregation problems, users exhibit various accuracy levels when comparing pairs of items.
We propose an elimination-based active sampling strategy, which estimates the ranking of items via noisy pairwise comparisons.
We prove that our algorithm can return the true ranking of items with high probability.
arXiv Detail & Related papers (2021-10-08T13:51:55Z) - PiRank: Learning To Rank via Differentiable Sorting [85.28916333414145]
We propose PiRank, a new class of differentiable surrogates for ranking.
We show that PiRank exactly recovers the desired metrics in the limit of zero temperature.
arXiv Detail & Related papers (2020-12-12T05:07:36Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.