Doubly-Robust Estimation for Unbiased Learning-to-Rank from
Position-Biased Click Feedback
- URL: http://arxiv.org/abs/2203.17118v1
- Date: Thu, 31 Mar 2022 15:38:25 GMT
- Title: Doubly-Robust Estimation for Unbiased Learning-to-Rank from
Position-Biased Click Feedback
- Authors: Harrie Oosterhuis
- Abstract summary: We introduce a novel DR estimator that uses the expectation of treatment per rank instead of IPS estimation.
Our results indicate it requires several orders of magnitude fewer datapoints to converge at optimal performance.
- Score: 13.579420996461439
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Clicks on rankings suffer from position bias: generally items on lower ranks
are less likely to be examined - and thus clicked - by users, in spite of their
actual preferences between items. The prevalent approach to unbiased
click-based Learning-to-Rank (LTR) is based on counterfactual
Inverse-Propensity-Scoring (IPS) estimation. Unique about LTR is the fact that
standard Doubly-Robust (DR) estimation - which combines IPS with regression
predictions - is inapplicable since the treatment variable - indicating whether
a user examined an item - cannot be observed in the data. In this paper, we
introduce a novel DR estimator that uses the expectation of treatment per rank
instead. Our novel DR estimator has more robust unbiasedness conditions than
the existing IPS approach, and in addition, provides enormous decreases in
variance: our experimental results indicate it requires several orders of
magnitude fewer datapoints to converge at optimal performance. For the unbiased
LTR field, our DR estimator contributes both increases in state-of-the-art
performance and the most robust theoretical guarantees of all known LTR
estimators.
Related papers
- Estimating the Hessian Matrix of Ranking Objectives for Stochastic Learning to Rank with Gradient Boosted Trees [63.18324983384337]
We introduce the first learning to rank method for Gradient Boosted Decision Trees (GBDTs)
Our main contribution is a novel estimator for the second-order derivatives, i.e., the Hessian matrix.
We incorporate our estimator into the existing PL-Rank framework, which was originally designed for first-order derivatives only.
arXiv Detail & Related papers (2024-04-18T13:53:32Z) - Off-Policy Evaluation of Ranking Policies under Diverse User Behavior [25.226825574282937]
Inverse Propensity Scoring (IPS) becomes extremely inaccurate in the ranking setup due to its high variance under large action spaces.
This work explores a far more general formulation where user behavior is diverse and can vary depending on the user context.
We show that the resulting estimator, which we call Adaptive IPS (AIPS), can be unbiased under any complex user behavior.
arXiv Detail & Related papers (2023-06-26T22:31:15Z) - Safe Deployment for Counterfactual Learning to Rank with Exposure-Based
Risk Minimization [63.93275508300137]
We introduce a novel risk-aware Counterfactual Learning To Rank method with theoretical guarantees for safe deployment.
Our experimental results demonstrate the efficacy of our proposed method, which is effective at avoiding initial periods of bad performance when little data is available.
arXiv Detail & Related papers (2023-04-26T15:54:23Z) - Rethinking Missing Data: Aleatoric Uncertainty-Aware Recommendation [59.500347564280204]
We propose a new Aleatoric Uncertainty-aware Recommendation (AUR) framework.
AUR consists of a new uncertainty estimator along with a normal recommender model.
As the chance of mislabeling reflects the potential of a pair, AUR makes recommendations according to the uncertainty.
arXiv Detail & Related papers (2022-09-22T04:32:51Z) - Cross Pairwise Ranking for Unbiased Item Recommendation [57.71258289870123]
We develop a new learning paradigm named Cross Pairwise Ranking (CPR)
CPR achieves unbiased recommendation without knowing the exposure mechanism.
We prove in theory that this way offsets the influence of user/item propensity on the learning.
arXiv Detail & Related papers (2022-04-26T09:20:27Z) - Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning [59.02006924867438]
Off-policy evaluation and learning (OPE/L) use offline observational data to make better decisions.
Recent work proposed distributionally robust OPE/L (DROPE/L) to remedy this, but the proposal relies on inverse-propensity weighting.
We propose the first DR algorithms for DROPE/L with KL-divergence uncertainty sets.
arXiv Detail & Related papers (2022-02-19T20:00:44Z) - Doubly Robust Off-Policy Evaluation for Ranking Policies under the
Cascade Behavior Model [11.101369123145588]
Off-policy evaluation for ranking policies enables performance estimation of new ranking policies using only logged data.
Previous studies introduce some assumptions on user behavior to make the item space tractable.
We propose the Cascade Doubly Robust estimator, which assumes that a user interacts with items sequentially from the top position in a ranking.
arXiv Detail & Related papers (2022-02-03T12:42:33Z) - Enhanced Doubly Robust Learning for Debiasing Post-click Conversion Rate
Estimation [29.27760413892272]
Post-click conversion, as a strong signal indicating the user preference, is salutary for building recommender systems.
Currently, most existing methods utilize counterfactual learning to debias recommender systems.
We propose a novel double learning approach for the MRDR estimator, which can convert the error imputation into the general CVR estimation.
arXiv Detail & Related papers (2021-05-28T06:59:49Z) - Accelerated Convergence for Counterfactual Learning to Rank [65.63997193915257]
We show that convergence rate of SGD approaches with IPS-weighted gradients suffers from the large variance introduced by the IPS weights.
We propose a novel learning algorithm, called CounterSample, that has provably better convergence than standard IPS-weighted gradient descent methods.
We prove that CounterSample converges faster and complement our theoretical findings with empirical results.
arXiv Detail & Related papers (2020-05-21T12:53:36Z) - Non-Clicks Mean Irrelevant? Propensity Ratio Scoring As a Correction [40.98264176722163]
Propensity Ratio Scoring (PRS) provides treatments on both clicks and non-clicks.
Our empirical evaluations confirm that PRS ensures a more effective use of click data and improved performance in both synthetic data and the real-world large-scale data from GMail search.
arXiv Detail & Related papers (2020-05-18T06:31:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.