Related papers: Inverse Propensity Score based offline estimator for deterministic ranking lists using position bias

Inverse Propensity Score based offline estimator for deterministic ranking lists using position bias

URL: http://arxiv.org/abs/2208.14980v1
Date: Wed, 31 Aug 2022 17:32:04 GMT
Title: Inverse Propensity Score based offline estimator for deterministic ranking lists using position bias
Authors: Nick Wood and Sumit Sidana
Abstract summary: We present a novel way of computing IPS using a position-bias model for deterministic logging policies. We validate this technique using two different experiments on industry-scale data.
Score: 0.1269104766024433
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work, we present a novel way of computing IPS using a position-bias model for deterministic logging policies. This technique significantly widens the policies on which OPE can be used. We validate this technique using two different experiments on industry-scale data. The OPE results are clearly strongly correlated with the online results, with some constant bias. The estimator requires the examination model to be a reasonably accurate approximation of real user behaviour.

Related papers

Unbiased Learning to Rank with Query-Level Click Propensity Estimation: Beyond Pointwise Observation and Relevance [74.43264459255121]
In real-world scenarios, users often click only one or two results after examining multiple relevant options. We propose a query-level click propensity model to capture the probability that users will click on different result lists. Our method introduces a Dual Inverse Propensity Weighting mechanism to address both relevance saturation and position bias.
arXiv Detail & Related papers (2025-02-17T03:55:51Z)
Automated Off-Policy Estimator Selection via Supervised Learning [7.476028372444458]
Off-Policy Evaluation (OPE) problem consists of evaluating the performance of counterfactual policies with data collected by another one. To solve the OPE problem, we resort to estimators, which aim to estimate in the most accurate way possible the performance that the counterfactual policies would have had if they were deployed in place of the logging policy. We propose an automated data-driven OPE estimator selection method based on supervised learning.
arXiv Detail & Related papers (2024-06-26T02:34:48Z)
Policy Gradient with Active Importance Sampling [55.112959067035916]
Policy gradient (PG) methods significantly benefit from IS, enabling the effective reuse of previously collected samples. However, IS is employed in RL as a passive tool for re-weighting historical samples. We look for the best behavioral policy from which to collect samples to reduce the policy gradient variance.
arXiv Detail & Related papers (2024-05-09T09:08:09Z)
Sample Complexity of Preference-Based Nonparametric Off-Policy Evaluation with Deep Networks [58.469818546042696]
We study the sample efficiency of OPE with human preference and establish a statistical guarantee for it. By appropriately selecting the size of a ReLU network, we show that one can leverage any low-dimensional manifold structure in the Markov decision process.
arXiv Detail & Related papers (2023-10-16T16:27:06Z)
Off-Policy Evaluation of Ranking Policies under Diverse User Behavior [25.226825574282937]
Inverse Propensity Scoring (IPS) becomes extremely inaccurate in the ranking setup due to its high variance under large action spaces. This work explores a far more general formulation where user behavior is diverse and can vary depending on the user context. We show that the resulting estimator, which we call Adaptive IPS (AIPS), can be unbiased under any complex user behavior.
arXiv Detail & Related papers (2023-06-26T22:31:15Z)
Uncertainty-Aware Instance Reweighting for Off-Policy Learning [63.31923483172859]
We propose a Uncertainty-aware Inverse Propensity Score estimator (UIPS) for improved off-policy learning. Experiment results on synthetic and three real-world recommendation datasets demonstrate the advantageous sample efficiency of the proposed UIPS estimator.
arXiv Detail & Related papers (2023-03-11T11:42:26Z)
Policy-Adaptive Estimator Selection for Off-Policy Evaluation [12.1655494876088]
Off-policy evaluation (OPE) aims to accurately evaluate the performance of counterfactual policies using only offline logged data. This paper studies this challenging problem of estimator selection for OPE for the first time. In particular, we enable an estimator selection that is adaptive to a given OPE task, by appropriately subsampling available logged data and constructing pseudo policies.
arXiv Detail & Related papers (2022-11-25T05:31:42Z)
Off-policy evaluation for learning-to-rank via interpolating the item-position model and the position-based model [83.83064559894989]
A critical need for industrial recommender systems is the ability to evaluate recommendation policies offline, before deploying them to production. We develop a new estimator that mitigates the problems of the two most popular off-policy estimators for rankings. In particular, the new estimator, called INTERPOL, addresses the bias of a potentially misspecified position-based model.
arXiv Detail & Related papers (2022-10-15T17:22:30Z)
Data-Driven Off-Policy Estimator Selection: An Application in User Marketing on An Online Content Delivery Service [11.986224119327387]
Off-policy evaluation is essential in domains such as healthcare, marketing or recommender systems. Many OPE methods with theoretical backgrounds have been proposed. It is often unknown for practitioners which estimator to use for their specific applications and purposes.
arXiv Detail & Related papers (2021-09-17T15:53:53Z)
Control Variates for Slate Off-Policy Evaluation [112.35528337130118]
We study the problem of off-policy evaluation from batched contextual bandit data with multidimensional actions. We obtain new estimators with risk improvement guarantees over both the PI and self-normalized PI estimators.
arXiv Detail & Related papers (2021-06-15T06:59:53Z)
Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits [5.144809478361604]
We improve the doubly robust (DR) estimator by adaptively weighting observations to control its variance. We provide empirical evidence for our estimator's improved accuracy and inferential properties relative to existing alternatives.
arXiv Detail & Related papers (2021-06-03T17:54:44Z)
Taking the Counterfactual Online: Efficient and Unbiased Online Evaluation for Ranking [74.46448041224247]
We introduce the novel Logging-Policy Optimization Algorithm (LogOpt), which optimize the policy for logging data. LogOpt turns the counterfactual approach - which is indifferent to the logging policy - into an online approach, where the algorithm decides what rankings to display. We prove that, as an online evaluation method, LogOpt is unbiased w.r.t. position and item-selection bias, unlike existing interleaving methods.
arXiv Detail & Related papers (2020-07-24T18:05:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.