An Offline Metric for the Debiasedness of Click Models
- URL: http://arxiv.org/abs/2304.09560v3
- Date: Sat, 14 Dec 2024 14:14:26 GMT
- Title: An Offline Metric for the Debiasedness of Click Models
- Authors: Romain Deffayet, Philipp Hager, Jean-Michel Renders, Maarten de Rijke,
- Abstract summary: Click models are a common method for extracting information from user clicks.
Recent work shows that the current evaluation practices in the community fail to guarantee that a well-performing click model generalizes well to downstream tasks.
We introduce the concept of debiasedness in click modeling and derive a metric for measuring it.
- Score: 52.25681483524383
- License:
- Abstract: A well-known problem when learning from user clicks are inherent biases prevalent in the data, such as position or trust bias. Click models are a common method for extracting information from user clicks, such as document relevance in web search, or to estimate click biases for downstream applications such as counterfactual learning-to-rank, ad placement, or fair ranking. Recent work shows that the current evaluation practices in the community fail to guarantee that a well-performing click model generalizes well to downstream tasks in which the ranking distribution differs from the training distribution, i.e., under covariate shift. In this work, we propose an evaluation metric based on conditional independence testing to detect a lack of robustness to covariate shift in click models. We introduce the concept of debiasedness in click modeling and derive a metric for measuring it. In extensive semi-synthetic experiments, we show that our proposed metric helps to predict the downstream performance of click models under covariate shift and is useful in an off-policy model selection setting.
Related papers
- Unbiased Learning to Rank with Query-Level Click Propensity Estimation: Beyond Pointwise Observation and Relevance [74.43264459255121]
In real-world scenarios, users often click only one or two results after examining multiple relevant options.
We propose a query-level click propensity model to capture the probability that users will click on different result lists.
Our method introduces a Dual Inverse Propensity Weighting mechanism to address both relevance saturation and position bias.
arXiv Detail & Related papers (2025-02-17T03:55:51Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of Open Information Extraction [49.15931834209624]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - Echoes: Unsupervised Debiasing via Pseudo-bias Labeling in an Echo
Chamber [17.034228910493056]
This paper presents experimental analyses revealing that the existing biased models overfit to bias-conflicting samples in the training data.
We propose a straightforward and effective method called Echoes, which trains a biased model and a target model with a different strategy.
Our approach achieves superior debiasing results compared to the existing baselines on both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-05-06T13:13:18Z) - Spuriosity Rankings: Sorting Data to Measure and Mitigate Biases [62.54519787811138]
We present a simple but effective method to measure and mitigate model biases caused by reliance on spurious cues.
We rank images within their classes based on spuriosity, proxied via deep neural features of an interpretable network.
Our results suggest that model bias due to spurious feature reliance is influenced far more by what the model is trained on than how it is trained.
arXiv Detail & Related papers (2022-12-05T23:15:43Z) - Canary in a Coalmine: Better Membership Inference with Ensembled
Adversarial Queries [53.222218035435006]
We use adversarial tools to optimize for queries that are discriminative and diverse.
Our improvements achieve significantly more accurate membership inference than existing methods.
arXiv Detail & Related papers (2022-10-19T17:46:50Z) - Evaluating Predictive Uncertainty and Robustness to Distributional Shift
Using Real World Data [0.0]
We propose metrics for general regression tasks using the Shifts Weather Prediction dataset.
We also present an evaluation of the baseline methods using these metrics.
arXiv Detail & Related papers (2021-11-08T17:32:10Z) - Handling Position Bias for Unbiased Learning to Rank in Hotels Search [0.951828574518325]
We will investigate the importance of properly handling the position bias in an online test environment in Tripadvisor Hotels search.
We propose an empirically effective method of handling the position bias that fully leverages the user action data.
The online A/B test results show that this method leads to an improved search ranking model.
arXiv Detail & Related papers (2020-02-28T03:48:42Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z) - Eliminating Search Intent Bias in Learning to Rank [0.32228025627337864]
We study how differences in user search intent can influence click activities and determined that there exists a bias between user search intent and the relevance of the document relevance.
We propose a search intent bias hypothesis that can be applied to most existing click models to improve their ability to learn unbiased relevance.
arXiv Detail & Related papers (2020-02-08T17:07:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.