Non-Clicks Mean Irrelevant? Propensity Ratio Scoring As a Correction
- URL: http://arxiv.org/abs/2005.08480v2
- Date: Sun, 14 Nov 2021 04:55:35 GMT
- Title: Non-Clicks Mean Irrelevant? Propensity Ratio Scoring As a Correction
- Authors: Nan Wang, Zhen Qin, Xuanhui Wang, Hongning Wang
- Abstract summary: Propensity Ratio Scoring (PRS) provides treatments on both clicks and non-clicks.
Our empirical evaluations confirm that PRS ensures a more effective use of click data and improved performance in both synthetic data and the real-world large-scale data from GMail search.
- Score: 40.98264176722163
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in unbiased learning to rank (LTR) count on Inverse
Propensity Scoring (IPS) to eliminate bias in implicit feedback. Though
theoretically sound in correcting the bias introduced by treating clicked
documents as relevant, IPS ignores the bias caused by (implicitly) treating
non-clicked ones as irrelevant. In this work, we first rigorously prove that
such use of click data leads to unnecessary pairwise comparisons between
relevant documents, which prevent unbiased ranker optimization. Based on the
proof, we derive a simple yet well justified new weighting scheme, called
Propensity Ratio Scoring (PRS), which provides treatments on both clicks and
non-clicks. Besides correcting the bias in clicks, PRS avoids relevant-relevant
document comparisons in LTR training and enjoys a lower variability. Our
extensive empirical evaluations confirm that PRS ensures a more effective use
of click data and improved performance in both synthetic data from a set of LTR
benchmarks, as well as in the real-world large-scale data from GMail search.
Related papers
- Mitigating Spurious Correlations via Disagreement Probability [4.8884049398279705]
Models trained with empirical risk minimization (ERM) are prone to be biased towards spurious correlations between target labels and bias attributes.
We introduce a training objective designed to robustly enhance model performance across all data samples.
We then derive a debiasing method, Disagreement Probability based Resampling for debiasing (DPR), which does not require bias labels.
arXiv Detail & Related papers (2024-11-04T02:44:04Z) - Contrastive Learning to Improve Retrieval for Real-world Fact Checking [84.57583869042791]
We present Contrastive Fact-Checking Reranker (CFR), an improved retriever for fact-checking complex claims.
We leverage the AVeriTeC dataset, which annotates subquestions for claims with human written answers from evidence documents.
We find a 6% improvement in veracity classification accuracy on the dataset.
arXiv Detail & Related papers (2024-10-07T00:09:50Z) - LLMs Can Patch Up Missing Relevance Judgments in Evaluation [56.51461892988846]
We use large language models (LLMs) to automatically label unjudged documents.
We simulate scenarios with varying degrees of holes by randomly dropping relevant documents from the relevance judgment in TREC DL tracks.
Our method achieves a Kendall tau correlation of 0.87 and 0.92 on an average for Vicuna-7B and GPT-3.5 Turbo respectively.
arXiv Detail & Related papers (2024-05-08T00:32:19Z) - FACTS: First Amplify Correlations and Then Slice to Discover Bias [17.244153084361102]
Computer vision datasets frequently contain spurious correlations between task-relevant labels and (easy to learn) latent task-irrelevant attributes.
Models trained on such datasets learn "shortcuts" and underperform on bias-conflicting slices of data where the correlation does not hold.
We propose First Amplify Correlations and Then Slice to Discover Bias to inform downstream bias mitigation strategies.
arXiv Detail & Related papers (2023-09-29T17:41:26Z) - Optimizing Group-Fair Plackett-Luce Ranking Models for Relevance and
Ex-Post Fairness [5.349671569838342]
In learning-to-rank, optimizing only the relevance can cause representational harm to certain categories of items.
In this paper, we propose a novel algorithm that maximizes expected relevance over those rankings that satisfy given representation constraints.
arXiv Detail & Related papers (2023-08-25T08:27:43Z) - Joint Optimization of Ranking and Calibration with Contextualized Hybrid
Model [24.66016187602343]
We propose an approach that can Jointly optimize the Ranking and abilities (JRC) for short.
JRC improves the ranking ability by contrasting the logit value for the sample with different labels and constrains the predicted probability to be a function of the logit subtraction.
JRC has been deployed on the display advertising platform of Alibaba and has obtained significant performance improvements.
arXiv Detail & Related papers (2022-08-12T08:32:13Z) - Cross Pairwise Ranking for Unbiased Item Recommendation [57.71258289870123]
We develop a new learning paradigm named Cross Pairwise Ranking (CPR)
CPR achieves unbiased recommendation without knowing the exposure mechanism.
We prove in theory that this way offsets the influence of user/item propensity on the learning.
arXiv Detail & Related papers (2022-04-26T09:20:27Z) - Doubly-Robust Estimation for Unbiased Learning-to-Rank from
Position-Biased Click Feedback [13.579420996461439]
We introduce a novel DR estimator that uses the expectation of treatment per rank instead of IPS estimation.
Our results indicate it requires several orders of magnitude fewer datapoints to converge at optimal performance.
arXiv Detail & Related papers (2022-03-31T15:38:25Z) - Pointwise Binary Classification with Pairwise Confidence Comparisons [97.79518780631457]
We propose pairwise comparison (Pcomp) classification, where we have only pairs of unlabeled data that we know one is more likely to be positive than the other.
We link Pcomp classification to noisy-label learning to develop a progressive URE and improve it by imposing consistency regularization.
arXiv Detail & Related papers (2020-10-05T09:23:58Z) - Taking the Counterfactual Online: Efficient and Unbiased Online
Evaluation for Ranking [74.46448041224247]
We introduce the novel Logging-Policy Optimization Algorithm (LogOpt), which optimize the policy for logging data.
LogOpt turns the counterfactual approach - which is indifferent to the logging policy - into an online approach, where the algorithm decides what rankings to display.
We prove that, as an online evaluation method, LogOpt is unbiased w.r.t. position and item-selection bias, unlike existing interleaving methods.
arXiv Detail & Related papers (2020-07-24T18:05:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.