Related papers: LLMs for estimating positional bias in logged interaction data

LLMs for estimating positional bias in logged interaction data

URL: http://arxiv.org/abs/2509.03696v1
Date: Wed, 03 Sep 2025 20:26:06 GMT
Title: LLMs for estimating positional bias in logged interaction data
Authors: Aleksandr V. Petrov, Michael Murtagh, Karthik Nagesh,
Abstract summary: We propose a novel method for estimating position bias using Large Language Models (LLMs)<n>Our experiments show that propensities estimated with our LLM-as-a-judge approach are stable across score buckets.<n>An IPS-weighted reranker trained with these propensities matches the production model on standard NDCG@10 while improving weighted NDCG@10 by roughly 2%.
Score: 44.839172857330674
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recommender and search systems commonly rely on Learning To Rank models trained on logged user interactions to order items by predicted relevance. However, such interaction data is often subject to position bias, as users are more likely to click on items that appear higher in the ranking, regardless of their actual relevance. As a result, newly trained models may inherit and reinforce the biases of prior ranking models rather than genuinely improving relevance. A standard approach to mitigate position bias is Inverse Propensity Scoring (IPS), where the model's loss is weighted by the inverse of a propensity function, an estimate of the probability that an item at a given position is examined. However, accurate propensity estimation is challenging, especially in interfaces with complex non-linear layouts. In this paper, we propose a novel method for estimating position bias using Large Language Models (LLMs) applied to logged user interaction data. This approach offers a cost-effective alternative to online experimentation. Our experiments show that propensities estimated with our LLM-as-a-judge approach are stable across score buckets and reveal the row-column effects of Viator's grid layout that simpler heuristics overlook. An IPS-weighted reranker trained with these propensities matches the production model on standard NDCG@10 while improving weighted NDCG@10 by roughly 2%. We will verify these offline gains in forthcoming live-traffic experiments.

Related papers

A Causal Information-Flow Framework for Unbiased Learning-to-Rank [52.54102347581931]
In web search and recommendation systems, user clicks are widely used to train ranking models.<n>We introduce a novel causal learning-based ranking framework that extends Unbiased Learning-to-Rank.<n>Our method consistently reduces measured bias leakage and improves ranking performance.
arXiv Detail & Related papers (2026-01-09T07:19:35Z)
Correcting for Position Bias in Learning to Rank: A Control Function Approach [9.986244291715762]
We propose a novel control function-based method that accounts for position bias in a two-stage process.<n>Unlike previous position bias correction methods, our method does not require knowledge of the click or propensity model.<n> Experimental results demonstrate that our method outperforms state-of-the-art approaches in correcting for position bias.
arXiv Detail & Related papers (2025-06-08T04:10:14Z)
Variational Bayesian Personalized Ranking [39.24591060825056]
Variational BPR is a novel and easily implementable learning objective that integrates likelihood optimization, noise reduction, and popularity debiasing.<n>We introduce an attention-based latent interest prototype contrastive mechanism, replacing instance-level contrastive learning, to effectively reduce noise from problematic samples.<n> Empirically, we demonstrate the effectiveness of Variational BPR on popular backbone recommendation models.
arXiv Detail & Related papers (2025-03-14T04:22:01Z)
Unbiased Learning to Rank with Query-Level Click Propensity Estimation: Beyond Pointwise Observation and Relevance [74.43264459255121]
In real-world scenarios, users often click only one or two results after examining multiple relevant options.<n>We propose a query-level click propensity model to capture the probability that users will click on different result lists.<n>Our method introduces a Dual Inverse Propensity Weighting mechanism to address both relevance saturation and position bias.
arXiv Detail & Related papers (2025-02-17T03:55:51Z)
Rethinking Missing Data: Aleatoric Uncertainty-Aware Recommendation [59.500347564280204]
We propose a new Aleatoric Uncertainty-aware Recommendation (AUR) framework. AUR consists of a new uncertainty estimator along with a normal recommender model. As the chance of mislabeling reflects the potential of a pair, AUR makes recommendations according to the uncertainty.
arXiv Detail & Related papers (2022-09-22T04:32:51Z)
D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases. A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network. For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z)
Cross Pairwise Ranking for Unbiased Item Recommendation [57.71258289870123]
We develop a new learning paradigm named Cross Pairwise Ranking (CPR) CPR achieves unbiased recommendation without knowing the exposure mechanism. We prove in theory that this way offsets the influence of user/item propensity on the learning.
arXiv Detail & Related papers (2022-04-26T09:20:27Z)
Unbiased Pairwise Learning to Rank in Recommender Systems [4.058828240864671]
Unbiased learning to rank algorithms are appealing candidates and have already been applied in many applications with single categorical labels. We propose a novel unbiased LTR algorithm to tackle the challenges, which innovatively models position bias in the pairwise fashion. Experiment results on public benchmark datasets and internal live traffic show the superior results of the proposed method for both categorical and continuous labels.
arXiv Detail & Related papers (2021-11-25T06:04:59Z)
Handling Position Bias for Unbiased Learning to Rank in Hotels Search [0.951828574518325]
We will investigate the importance of properly handling the position bias in an online test environment in Tripadvisor Hotels search. We propose an empirically effective method of handling the position bias that fully leverages the user action data. The online A/B test results show that this method leads to an improved search ranking model.
arXiv Detail & Related papers (2020-02-28T03:48:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.