Related papers: Debiased Off-Policy Evaluation for Recommendation Systems

Debiased Off-Policy Evaluation for Recommendation Systems

URL: http://arxiv.org/abs/2002.08536v3
Date: Mon, 2 Aug 2021 21:03:26 GMT
Title: Debiased Off-Policy Evaluation for Recommendation Systems
Authors: Yusuke Narita, Shota Yasui, Kohei Yata
Abstract summary: A/B tests are reliable, but are time- and money-consuming, and entail a risk of failure. We develop an alternative method, which predicts the performance of algorithms given historical data. Our method produces smaller mean squared errors than state-of-the-art methods.
Score: 8.63711086812655
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Efficient methods to evaluate new algorithms are critical for improving interactive bandit and reinforcement learning systems such as recommendation systems. A/B tests are reliable, but are time- and money-consuming, and entail a risk of failure. In this paper, we develop an alternative method, which predicts the performance of algorithms given historical data that may have been generated by a different algorithm. Our estimator has the property that its prediction converges in probability to the true performance of a counterfactual algorithm at a rate of $\sqrt{N}$, as the sample size $N$ increases. We also show a correct way to estimate the variance of our prediction, thus allowing the analyst to quantify the uncertainty in the prediction. These properties hold even when the analyst does not know which among a large number of potentially important state variables are actually important. We validate our method by a simulation experiment about reinforcement learning. We finally apply it to improve advertisement design by a major advertisement company. We find that our method produces smaller mean squared errors than state-of-the-art methods.

Related papers

Regret Minimization and Statistical Inference in Online Decision Making with High-dimensional Covariates [7.21848268647674]
We integrate the $varepsilon$-greedy bandit algorithm for decision-making with a hard thresholding algorithm for estimating sparse bandit parameters. Under a margin condition, our method achieves either $O(T1/2)$ regret or classical $O(T1/2)$-consistent inference.
arXiv Detail & Related papers (2024-11-10T01:47:11Z)
Online non-parametric likelihood-ratio estimation by Pearson-divergence functional minimization [55.98760097296213]
We introduce a new framework for online non-parametric LRE (OLRE) for the setting where pairs of iid observations $(x_t sim p, x'_t sim q)$ are observed over time. We provide theoretical guarantees for the performance of the OLRE method along with empirical validation in synthetic experiments.
arXiv Detail & Related papers (2023-11-03T13:20:11Z)
Distribution-free risk assessment of regression-based machine learning algorithms [6.507711025292814]
We focus on regression algorithms and the risk-assessment task of computing the probability of the true label lying inside an interval defined around the model's prediction. We solve the risk-assessment problem using the conformal prediction approach, which provides prediction intervals that are guaranteed to contain the true label with a given probability.
arXiv Detail & Related papers (2023-10-05T13:57:24Z)
Uncertainty Estimation based on Geometric Separation [13.588210692213568]
In machine learning, accurately predicting the probability that a specific input is correct is crucial for risk management. We put forward a novel geometric-based approach for improving uncertainty estimations in machine learning models.
arXiv Detail & Related papers (2023-01-11T13:19:24Z)
Risk Preferences of Learning Algorithms [0.0]
We show that a widely used learning algorithm, $varepsilon$-Greedy, exhibits emergent risk aversion. We discuss two methods to correct this bias.
arXiv Detail & Related papers (2022-05-10T01:30:24Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Robustification of Online Graph Exploration Methods [59.50307752165016]
We study a learning-augmented variant of the classical, notoriously hard online graph exploration problem. We propose an algorithm that naturally integrates predictions into the well-known Nearest Neighbor (NN) algorithm.
arXiv Detail & Related papers (2021-12-10T10:02:31Z)
SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets. Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z)
Double Coverage with Machine-Learned Advice [100.23487145400833]
We study the fundamental online $k$-server problem in a learning-augmented setting. We show that our algorithm achieves for any k an almost optimal consistency-robustness tradeoff.
arXiv Detail & Related papers (2021-03-02T11:04:33Z)
Large-Scale Methods for Distributionally Robust Optimization [53.98643772533416]
We prove that our algorithms require a number of evaluations gradient independent of training set size and number of parameters. Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
arXiv Detail & Related papers (2020-10-12T17:41:44Z)
Scalable Approximate Inference and Some Applications [2.6541211006790983]
In this thesis, we propose a new framework for approximate inference. Our proposed four algorithms are motivated by the recent computational progress of Stein's method. Results on simulated and real datasets indicate the statistical efficiency and wide applicability of our algorithm.
arXiv Detail & Related papers (2020-03-07T04:33:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.