Related papers: Uncertainty Calibration for Counterfactual Propensity Estimation in Recommendation

Uncertainty Calibration for Counterfactual Propensity Estimation in Recommendation

URL: http://arxiv.org/abs/2303.12973v2
Date: Mon, 15 Jul 2024 01:57:30 GMT
Title: Uncertainty Calibration for Counterfactual Propensity Estimation in Recommendation
Authors: Wenbo Hu, Xin Sun, Qiang liu, Le Wu, Liang Wang,
Abstract summary: inverse propensity score (IPS) is employed to weight the prediction error of each observed instance. IPS-based recommendations are hampered by miscalibration in propensity estimation. We introduce a model-agnostic calibration framework for propensity-based debiasing of CVR predictions.
Score: 22.67361489565711
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Post-click conversion rate (CVR) is a reliable indicator of online customers' preferences, making it crucial for developing recommender systems. A major challenge in predicting CVR is severe selection bias, arising from users' inherent self-selection behavior and the system's item selection process. To mitigate this issue, the inverse propensity score (IPS) is employed to weight the prediction error of each observed instance. However, current propensity score estimations are unreliable due to the lack of a quality measure. To address this, we evaluate the quality of propensity scores from the perspective of uncertainty calibration, proposing the use of expected calibration error (ECE) as a measure of propensity-score quality. We argue that the performance of IPS-based recommendations is hampered by miscalibration in propensity estimation. We introduce a model-agnostic calibration framework for propensity-based debiasing of CVR predictions. Theoretical analysis on bias and generalization bounds demonstrates the superiority of calibrated propensity estimates over uncalibrated ones. Experiments conducted on the Coat, Yahoo and KuaiRand datasets show improved uncertainty calibration, as evidenced by lower ECE values, leading to enhanced CVR prediction outcomes.

Related papers

When Can We Reuse a Calibration Set for Multiple Conformal Predictions? [0.0]
We show how e-conformal prediction, in conjunction with Hoeffding's inequality, can enable the repeated use of a single calibration set.<n>We train a deep neural network and utilise a calibration set to estimate a Hoeffding correction.<n>This correction allows us to apply a modified Markov's inequality, leading to the construction of prediction sets with quantifiable confidence.
arXiv Detail & Related papers (2025-06-24T14:57:25Z)
Calibrated Probabilistic Forecasts for Arbitrary Sequences [58.54729945445505]
Real-world data streams can change unpredictably due to distribution shifts, feedback loops and adversarial actors. We present a forecasting framework ensuring valid uncertainty estimates regardless of how data evolves.
arXiv Detail & Related papers (2024-09-27T21:46:42Z)
A Confidence Interval for the $\ell_2$ Expected Calibration Error [35.88784957918326]
We develop confidence intervals $ell$ Expected the Error (ECE) We consider top-1-to-$k$ calibration, which includes both the popular notion of confidence calibration as well as calibration. For a debiased estimator of the ECE, we show normality, but with different convergence rates and variances for calibrated and misd models.
arXiv Detail & Related papers (2024-08-16T20:00:08Z)
Probabilistic Scores of Classifiers, Calibration is not Enough [0.32985979395737786]
In binary classification tasks, accurate representation of probabilistic predictions is essential for various real-world applications. In this study, we highlight approaches that prioritize the alignment between predicted scores and true probability distributions. Our findings reveal limitations in traditional calibration metrics, which could undermine the reliability of predictive models for critical decision-making.
arXiv Detail & Related papers (2024-08-06T19:53:00Z)
Towards Certification of Uncertainty Calibration under Adversarial Attacks [96.48317453951418]
We show that attacks can significantly harm calibration, and thus propose certified calibration as worst-case bounds on calibration under adversarial perturbations. We propose novel calibration attacks and demonstrate how they can improve model calibration through textitadversarial calibration training
arXiv Detail & Related papers (2024-05-22T18:52:09Z)
Doubly Calibrated Estimator for Recommendation on Data Missing Not At Random [20.889464448762176]
We argue that existing estimators rely on miscalibrated imputed errors and propensity scores. We propose a Doubly Calibrated Estimator that involves the calibration of both the imputation and propensity models.
arXiv Detail & Related papers (2024-02-26T05:08:52Z)
Calibration by Distribution Matching: Trainable Kernel Calibration Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression. These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization. We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z)
U-Calibration: Forecasting for an Unknown Agent [29.3181385170725]
We show that optimizing forecasts for a single scoring rule cannot guarantee low regret for all possible agents. We present a new metric for evaluating forecasts that we call U-calibration, equal to the maximal regret of the sequence of forecasts.
arXiv Detail & Related papers (2023-06-30T23:05:26Z)
Evaluating Probabilistic Classifiers: The Triptych [62.997667081978825]
We propose and study a triptych of diagnostic graphics that focus on distinct and complementary aspects of forecast performance. The reliability diagram addresses calibration, the receiver operating characteristic (ROC) curve diagnoses discrimination ability, and the Murphy diagram visualizes overall predictive performance and value.
arXiv Detail & Related papers (2023-01-25T19:35:23Z)
Calibrated Selective Classification [34.08454890436067]
We develop a new approach to selective classification in which we propose a method for rejecting examples with "uncertain" uncertainties. We present a framework for learning selectively calibrated models, where a separate selector network is trained to improve the selective calibration error of a given base model. We demonstrate the empirical effectiveness of our approach on multiple image classification and lung cancer risk assessment tasks.
arXiv Detail & Related papers (2022-08-25T13:31:09Z)
Better Uncertainty Calibration via Proper Scores for Classification and Beyond [15.981380319863527]
We introduce the framework of proper calibration errors, which relates every calibration error to a proper score. This relationship can be used to reliably quantify the model calibration improvement.
arXiv Detail & Related papers (2022-03-15T12:46:08Z)
T-Cal: An optimal test for the calibration of predictive models [49.11538724574202]
We consider detecting mis-calibration of predictive models using a finite validation dataset as a hypothesis testing problem. detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions. We propose T-Cal, a minimax test for calibration based on a de-biased plug-in estimator of the $ell$-Expected Error (ECE)
arXiv Detail & Related papers (2022-03-03T16:58:54Z)
Evaluating probabilistic classifiers: Reliability diagrams and score decompositions revisited [68.8204255655161]
We introduce the CORP approach, which generates provably statistically Consistent, Optimally binned, and Reproducible reliability diagrams in an automated way. Corpor is based on non-parametric isotonic regression and implemented via the Pool-adjacent-violators (PAV) algorithm.
arXiv Detail & Related papers (2020-08-07T08:22:26Z)
Individual Calibration with Randomized Forecasting [116.2086707626651]
We show that calibration for individual samples is possible in the regression setup if the predictions are randomized. We design a training objective to enforce individual calibration and use it to train randomized regression functions.
arXiv Detail & Related papers (2020-06-18T05:53:10Z)
Understanding and Mitigating the Tradeoff Between Robustness and Accuracy [88.51943635427709]
Adversarial training augments the training set with perturbations to improve the robust error. We show that the standard error could increase even when the augmented perturbations have noiseless observations from the optimal linear predictor.
arXiv Detail & Related papers (2020-02-25T08:03:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.