Related papers: Non-Clicks Mean Irrelevant? Propensity Ratio Scoring As a Correction

Non-Clicks Mean Irrelevant? Propensity Ratio Scoring As a Correction

URL: http://arxiv.org/abs/2005.08480v2
Date: Sun, 14 Nov 2021 04:55:35 GMT
Title: Non-Clicks Mean Irrelevant? Propensity Ratio Scoring As a Correction
Authors: Nan Wang, Zhen Qin, Xuanhui Wang, Hongning Wang
Abstract summary: Propensity Ratio Scoring (PRS) provides treatments on both clicks and non-clicks. Our empirical evaluations confirm that PRS ensures a more effective use of click data and improved performance in both synthetic data and the real-world large-scale data from GMail search.
Score: 40.98264176722163
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in unbiased learning to rank (LTR) count on Inverse Propensity Scoring (IPS) to eliminate bias in implicit feedback. Though theoretically sound in correcting the bias introduced by treating clicked documents as relevant, IPS ignores the bias caused by (implicitly) treating non-clicked ones as irrelevant. In this work, we first rigorously prove that such use of click data leads to unnecessary pairwise comparisons between relevant documents, which prevent unbiased ranker optimization. Based on the proof, we derive a simple yet well justified new weighting scheme, called Propensity Ratio Scoring (PRS), which provides treatments on both clicks and non-clicks. Besides correcting the bias in clicks, PRS avoids relevant-relevant document comparisons in LTR training and enjoys a lower variability. Our extensive empirical evaluations confirm that PRS ensures a more effective use of click data and improved performance in both synthetic data from a set of LTR benchmarks, as well as in the real-world large-scale data from GMail search.

Related papers

Unbiased Learning to Rank with Query-Level Click Propensity Estimation: Beyond Pointwise Observation and Relevance [74.43264459255121]
In real-world scenarios, users often click only one or two results after examining multiple relevant options. We propose a query-level click propensity model to capture the probability that users will click on different result lists. Our method introduces a Dual Inverse Propensity Weighting mechanism to address both relevance saturation and position bias.
arXiv Detail & Related papers (2025-02-17T03:55:51Z)
Mitigating Spurious Correlations via Disagreement Probability [4.8884049398279705]
Models trained with empirical risk minimization (ERM) are prone to be biased towards spurious correlations between target labels and bias attributes. We introduce a training objective designed to robustly enhance model performance across all data samples. We then derive a debiasing method, Disagreement Probability based Resampling for debiasing (DPR), which does not require bias labels.
arXiv Detail & Related papers (2024-11-04T02:44:04Z)
Contrastive Learning to Improve Retrieval for Real-world Fact Checking [84.57583869042791]
We present Contrastive Fact-Checking Reranker (CFR), an improved retriever for fact-checking complex claims. We leverage the AVeriTeC dataset, which annotates subquestions for claims with human written answers from evidence documents. We find a 6% improvement in veracity classification accuracy on the dataset.
arXiv Detail & Related papers (2024-10-07T00:09:50Z)
LLMs Can Patch Up Missing Relevance Judgments in Evaluation [56.51461892988846]
We use large language models (LLMs) to automatically label unjudged documents. We simulate scenarios with varying degrees of holes by randomly dropping relevant documents from the relevance judgment in TREC DL tracks. Our method achieves a Kendall tau correlation of 0.87 and 0.92 on an average for Vicuna-7B and GPT-3.5 Turbo respectively.
arXiv Detail & Related papers (2024-05-08T00:32:19Z)
FACTS: First Amplify Correlations and Then Slice to Discover Bias [17.244153084361102]
Computer vision datasets frequently contain spurious correlations between task-relevant labels and (easy to learn) latent task-irrelevant attributes. Models trained on such datasets learn "shortcuts" and underperform on bias-conflicting slices of data where the correlation does not hold. We propose First Amplify Correlations and Then Slice to Discover Bias to inform downstream bias mitigation strategies.
arXiv Detail & Related papers (2023-09-29T17:41:26Z)
Optimizing Group-Fair Plackett-Luce Ranking Models for Relevance and Ex-Post Fairness [5.349671569838342]
In learning-to-rank, optimizing only the relevance can cause representational harm to certain categories of items. In this paper, we propose a novel algorithm that maximizes expected relevance over those rankings that satisfy given representation constraints.
arXiv Detail & Related papers (2023-08-25T08:27:43Z)
Joint Optimization of Ranking and Calibration with Contextualized Hybrid Model [24.66016187602343]
We propose an approach that can Jointly optimize the Ranking and abilities (JRC) for short. JRC improves the ranking ability by contrasting the logit value for the sample with different labels and constrains the predicted probability to be a function of the logit subtraction. JRC has been deployed on the display advertising platform of Alibaba and has obtained significant performance improvements.
arXiv Detail & Related papers (2022-08-12T08:32:13Z)
Cross Pairwise Ranking for Unbiased Item Recommendation [57.71258289870123]
We develop a new learning paradigm named Cross Pairwise Ranking (CPR) CPR achieves unbiased recommendation without knowing the exposure mechanism. We prove in theory that this way offsets the influence of user/item propensity on the learning.
arXiv Detail & Related papers (2022-04-26T09:20:27Z)
Doubly-Robust Estimation for Unbiased Learning-to-Rank from Position-Biased Click Feedback [13.579420996461439]
We introduce a novel DR estimator that uses the expectation of treatment per rank instead of IPS estimation. Our results indicate it requires several orders of magnitude fewer datapoints to converge at optimal performance.
arXiv Detail & Related papers (2022-03-31T15:38:25Z)
Pointwise Binary Classification with Pairwise Confidence Comparisons [97.79518780631457]
We propose pairwise comparison (Pcomp) classification, where we have only pairs of unlabeled data that we know one is more likely to be positive than the other. We link Pcomp classification to noisy-label learning to develop a progressive URE and improve it by imposing consistency regularization.
arXiv Detail & Related papers (2020-10-05T09:23:58Z)
Taking the Counterfactual Online: Efficient and Unbiased Online Evaluation for Ranking [74.46448041224247]
We introduce the novel Logging-Policy Optimization Algorithm (LogOpt), which optimize the policy for logging data. LogOpt turns the counterfactual approach - which is indifferent to the logging policy - into an online approach, where the algorithm decides what rankings to display. We prove that, as an online evaluation method, LogOpt is unbiased w.r.t. position and item-selection bias, unlike existing interleaving methods.
arXiv Detail & Related papers (2020-07-24T18:05:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.