Offline Recommender System Evaluation under Unobserved Confounding
- URL: http://arxiv.org/abs/2309.04222v1
- Date: Fri, 8 Sep 2023 09:11:26 GMT
- Title: Offline Recommender System Evaluation under Unobserved Confounding
- Authors: Olivier Jeunen and Ben London
- Abstract summary: Off-Policy Estimation methods allow us to learn and evaluate decision-making policies from logged data.
An important assumption that makes this work is the absence of unobserved confounders.
This work aims to highlight the problems that arise when performing off-policy estimation in the presence of unobserved confounders.
- Score: 5.4208903577329375
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Off-Policy Estimation (OPE) methods allow us to learn and evaluate
decision-making policies from logged data. This makes them an attractive choice
for the offline evaluation of recommender systems, and several recent works
have reported successful adoption of OPE methods to this end. An important
assumption that makes this work is the absence of unobserved confounders:
random variables that influence both actions and rewards at data collection
time. Because the data collection policy is typically under the practitioner's
control, the unconfoundedness assumption is often left implicit, and its
violations are rarely dealt with in the existing literature.
This work aims to highlight the problems that arise when performing
off-policy estimation in the presence of unobserved confounders, specifically
focusing on a recommendation use-case. We focus on policy-based estimators,
where the logging propensities are learned from logged data. We characterise
the statistical bias that arises due to confounding, and show how existing
diagnostics are unable to uncover such cases. Because the bias depends directly
on the true and unobserved logging propensities, it is non-identifiable. As the
unconfoundedness assumption is famously untestable, this becomes especially
problematic. This paper emphasises this common, yet often overlooked issue.
Through synthetic data, we empirically show how na\"ive propensity estimation
under confounding can lead to severely biased metric estimates that are allowed
to fly under the radar. We aim to cultivate an awareness among researchers and
practitioners of this important problem, and touch upon potential research
directions towards mitigating its effects.
Related papers
- Data Poisoning Attacks on Off-Policy Policy Evaluation Methods [38.68161633374251]
We make the first attempt at investigating the sensitivity of OPE methods to marginal adversarial perturbations to the data.
We design a generic data poisoning attack framework leveraging influence functions from robust statistics to carefully construct perturbations that maximize error in the policy value estimates.
Our results demonstrate that many existing OPE methods are highly prone to generating value estimates with large errors when subject to data poisoning attacks, even for small adversarial perturbations.
arXiv Detail & Related papers (2024-04-06T19:27:57Z) - Too Good To Be True: performance overestimation in (re)current practices
for Human Activity Recognition [49.1574468325115]
sliding windows for data segmentation followed by standard random k-fold cross validation produce biased results.
It is important to raise awareness in the scientific community about this problem, whose negative effects are being overlooked.
Several experiments with different types of datasets and different types of classification models allow us to exhibit the problem and show it persists independently of the method or dataset.
arXiv Detail & Related papers (2023-10-18T13:24:05Z) - Delphic Offline Reinforcement Learning under Nonidentifiable Hidden
Confounding [10.315867984674032]
We propose a definition of uncertainty due to hidden confounding bias, termed delphic uncertainty.
We derive a practical method for estimating the three types of uncertainties, and construct a pessimistic offline RL algorithm to account for them.
Our results suggest that nonidentifiable hidden confounding bias can be mitigated to improve offline RL solutions in practice.
arXiv Detail & Related papers (2023-06-01T21:27:22Z) - A Unified Framework of Policy Learning for Contextual Bandit with
Confounding Bias and Missing Observations [108.89353070722497]
We study the offline contextual bandit problem, where we aim to acquire an optimal policy using observational data.
We present a new algorithm called Causal-Adjusted Pessimistic (CAP) policy learning, which forms the reward function as the solution of an integral equation system.
arXiv Detail & Related papers (2023-03-20T15:17:31Z) - Uncertainty-Aware Instance Reweighting for Off-Policy Learning [63.31923483172859]
We propose a Uncertainty-aware Inverse Propensity Score estimator (UIPS) for improved off-policy learning.
Experiment results on synthetic and three real-world recommendation datasets demonstrate the advantageous sample efficiency of the proposed UIPS estimator.
arXiv Detail & Related papers (2023-03-11T11:42:26Z) - Debiasing Recommendation by Learning Identifiable Latent Confounders [49.16119112336605]
Confounding bias arises due to the presence of unmeasured variables that can affect both a user's exposure and feedback.
Existing methods either (1) make untenable assumptions about these unmeasured variables or (2) directly infer latent confounders from users' exposure.
We propose a novel method, i.e., identifiable deconfounder (iDCF), which leverages a set of proxy variables to resolve the aforementioned non-identification issue.
arXiv Detail & Related papers (2023-02-10T05:10:26Z) - In Search of Insights, Not Magic Bullets: Towards Demystification of the
Model Selection Dilemma in Heterogeneous Treatment Effect Estimation [92.51773744318119]
This paper empirically investigates the strengths and weaknesses of different model selection criteria.
We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them.
arXiv Detail & Related papers (2023-02-06T16:55:37Z) - Model-Free and Model-Based Policy Evaluation when Causality is Uncertain [7.858296711223292]
In off-policy evaluation, there may exist unobserved variables that both impact the dynamics and are used by the unknown behavior policy.
We develop worst-case bounds to assess sensitivity to these unobserved confounders in finite horizons.
We show that a model-based approach with robust MDPs gives sharper lower bounds by exploiting domain knowledge about the dynamics.
arXiv Detail & Related papers (2022-04-02T23:40:15Z) - Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and
Dual Bounds [21.520045697447372]
Off-policy evaluation (OPE) is the task of estimating the expected reward of a given policy based on offline data previously collected under different policies.
This work considers the problem of constructing non-asymptotic confidence intervals in infinite-horizon off-policy evaluation.
We develop a practical algorithm through a primal-dual optimization-based approach.
arXiv Detail & Related papers (2021-03-09T22:31:20Z) - Temporal Difference Uncertainties as a Signal for Exploration [76.6341354269013]
An effective approach to exploration in reinforcement learning is to rely on an agent's uncertainty over the optimal policy.
In this paper, we highlight that value estimates are easily biased and temporally inconsistent.
We propose a novel method for estimating uncertainty over the value function that relies on inducing a distribution over temporal difference errors.
arXiv Detail & Related papers (2020-10-05T18:11:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.