Related papers: Are Recommenders Self-Aware? Label-Free Recommendation Performance Estimation via Model Uncertainty

Are Recommenders Self-Aware? Label-Free Recommendation Performance Estimation via Model Uncertainty

URL: http://arxiv.org/abs/2507.23208v1
Date: Thu, 31 Jul 2025 03:04:34 GMT
Title: Are Recommenders Self-Aware? Label-Free Recommendation Performance Estimation via Model Uncertainty
Authors: Jiayu Li, Ziyi Ye, Guohao Jian, Zhiqiang Guo, Weizhi Ma, Qingyao Ai, Min Zhang,
Abstract summary: This paper investigates the recommender's self-awareness by quantifying its uncertainty.<n>We propose a method, probability-based List Distribution uncertainty (LiDu)<n>LiDu measures uncertainty by determining the probability that a recommender will generate a certain ranking list.
Score: 27.396301623717072
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Can a recommendation model be self-aware? This paper investigates the recommender's self-awareness by quantifying its uncertainty, which provides a label-free estimation of its performance. Such self-assessment can enable more informed understanding and decision-making before the recommender engages with any users. To this end, we propose an intuitive and effective method, probability-based List Distribution uncertainty (LiDu). LiDu measures uncertainty by determining the probability that a recommender will generate a certain ranking list based on the prediction distributions of individual items. We validate LiDu's ability to represent model self-awareness in two settings: (1) with a matrix factorization model on a synthetic dataset, and (2) with popular recommendation algorithms on real-world datasets. Experimental results show that LiDu is more correlated with recommendation performance than a series of label-free performance estimators. Additionally, LiDu provides valuable insights into the dynamic inner states of models throughout training and inference. This work establishes an empirical connection between recommendation uncertainty and performance, framing it as a step towards more transparent and self-evaluating recommender systems.

Related papers

A Probabilistic Perspective on Unlearning and Alignment for Large Language Models [48.96686419141881]
We introduce the first formal probabilistic evaluation framework for Large Language Models (LLMs)<n> Namely, we propose novel metrics with high probability guarantees concerning the output distribution of a model.<n>Our metrics are application-independent and allow practitioners to make more reliable estimates about model capabilities before deployment.
arXiv Detail & Related papers (2024-10-04T15:44:23Z)
Self-Evolutionary Large Language Models through Uncertainty-Enhanced Preference Optimization [9.618391485742968]
Iterative preference optimization has recently become one of the de-facto training paradigms for large language models (LLMs) We present an uncertainty-enhanced textbfPreference textbfOptimization framework to make the LLM self-evolve with reliable feedback. Our framework substantially alleviates the noisy problem and improves the performance of iterative preference optimization.
arXiv Detail & Related papers (2024-09-17T14:05:58Z)
Doubly Calibrated Estimator for Recommendation on Data Missing Not At Random [20.889464448762176]
We argue that existing estimators rely on miscalibrated imputed errors and propensity scores. We propose a Doubly Calibrated Estimator that involves the calibration of both the imputation and propensity models.
arXiv Detail & Related papers (2024-02-26T05:08:52Z)
Restricted Bernoulli Matrix Factorization: Balancing the trade-off between prediction accuracy and coverage in classification based collaborative filtering [45.335821132209766]
We propose Restricted Bernoulli Matrix Factorization (ResBeMF) to enhance the performance of classification-based collaborative filtering. The proposed model provides a good balance in terms of the quality measures used compared to other recommendation models.
arXiv Detail & Related papers (2022-10-05T13:48:19Z)
Rethinking Missing Data: Aleatoric Uncertainty-Aware Recommendation [59.500347564280204]
We propose a new Aleatoric Uncertainty-aware Recommendation (AUR) framework. AUR consists of a new uncertainty estimator along with a normal recommender model. As the chance of mislabeling reflects the potential of a pair, AUR makes recommendations according to the uncertainty.
arXiv Detail & Related papers (2022-09-22T04:32:51Z)
Recommendation Systems with Distribution-Free Reliability Guarantees [83.80644194980042]
We show how to return a set of items rigorously guaranteed to contain mostly good items. Our procedure endows any ranking model with rigorous finite-sample control of the false discovery rate. We evaluate our methods on the Yahoo! Learning to Rank and MSMarco datasets.
arXiv Detail & Related papers (2022-07-04T17:49:25Z)
Debiasing Learning for Membership Inference Attacks Against Recommender Systems [79.48353547307887]
Learned recommender systems may inadvertently leak information about their training data, leading to privacy violations. We investigate privacy threats faced by recommender systems through the lens of membership inference. We propose a Debiasing Learning for Membership Inference Attacks against recommender systems (DL-MIA) framework that has four main components.
arXiv Detail & Related papers (2022-06-24T17:57:34Z)
Quantifying Availability and Discovery in Recommender Systems via Stochastic Reachability [27.21058243752746]
We propose an evaluation procedure based on reachability to quantify the maximum probability of recommending a target piece of content to a user. reachability can be used to detect biases in the availability of content and diagnose limitations in the opportunities for discovery granted to users. We demonstrate evaluations of recommendation algorithms trained on large datasets of explicit and implicit ratings.
arXiv Detail & Related papers (2021-06-30T16:18:12Z)
Towards Open-World Recommendation: An Inductive Model-based Collaborative Filtering Approach [115.76667128325361]
Recommendation models can effectively estimate underlying user interests and predict one's future behaviors. We propose an inductive collaborative filtering framework that contains two representation models. Our model achieves promising results for recommendation on few-shot users with limited training ratings and new unseen users.
arXiv Detail & Related papers (2020-07-09T14:31:25Z)
Binary Classification from Positive Data with Skewed Confidence [85.18941440826309]
Positive-confidence (Pconf) classification is a promising weakly-supervised learning method. In practice, the confidence may be skewed by bias arising in an annotation process. We introduce the parameterized model of the skewed confidence, and propose the method for selecting the hyper parameter.
arXiv Detail & Related papers (2020-01-29T00:04:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.