A Theoretical Analysis of Recommendation Loss Functions under Negative Sampling
- URL: http://arxiv.org/abs/2411.07770v1
- Date: Tue, 12 Nov 2024 13:06:16 GMT
- Title: A Theoretical Analysis of Recommendation Loss Functions under Negative Sampling
- Authors: Giulia Di Teodoro, Federico Siciliano, Nicola Tonellotto, Fabrizio Silvestri,
- Abstract summary: This paper conducts a comparative analysis of prevalent loss functions in Recommender Systems (RSs)
We show that Binary Cross-Entropy (BCE), Categorical Cross-Entropy (CCE), and Bayesian Personalized Ranking (BPR) are equivalent when one negative sample is used.
- Score: 13.180345241212423
- License:
- Abstract: Recommender Systems (RSs) are pivotal in diverse domains such as e-commerce, music streaming, and social media. This paper conducts a comparative analysis of prevalent loss functions in RSs: Binary Cross-Entropy (BCE), Categorical Cross-Entropy (CCE), and Bayesian Personalized Ranking (BPR). Exploring the behaviour of these loss functions across varying negative sampling settings, we reveal that BPR and CCE are equivalent when one negative sample is used. Additionally, we demonstrate that all losses share a common global minimum. Evaluation of RSs mainly relies on ranking metrics known as Normalized Discounted Cumulative Gain (NDCG) and Mean Reciprocal Rank (MRR). We produce bounds of the different losses for negative sampling settings to establish a probabilistic lower bound for NDCG. We show that the BPR bound on NDCG is weaker than that of BCE, contradicting the common assumption that BPR is superior to BCE in RSs training. Experiments on five datasets and four models empirically support these theoretical findings. Our code is available at \url{https://anonymous.4open.science/r/recsys_losses} .
Related papers
- SimCE: Simplifying Cross-Entropy Loss for Collaborative Filtering [47.81610130269399]
We propose a Sampled Softmax Cross-Entropy (SSM) that compares one positive sample with multiple negative samples, leading to better performance.
We also introduce a underlineSimplified Sampled Softmax underlineCross-underlineEntropy Loss (SimCE) which simplifies the SSM using its upper bound.
Our validation on 12 benchmark datasets, using both MF and LightGCN backbones, shows that SimCE significantly outperforms both BPR and SSM.
arXiv Detail & Related papers (2024-06-23T17:24:07Z) - Comprehensive Analysis of Negative Sampling in Knowledge Graph
Representation Learning [25.664174172917345]
Negative sampling (NS) loss plays an important role in learning knowledge graph embedding (KGE) to handle a huge number of entities.
We theoretically analyzed NS loss to assist hyper parameter tuning and understand the better use of the NS loss in KGE learning.
Our empirical analysis on the FB15k-237, WN18RR, and YAGO3-10 datasets showed that the results of actually trained models agree with our theoretical findings.
arXiv Detail & Related papers (2022-06-21T06:51:33Z) - Do More Negative Samples Necessarily Hurt in Contrastive Learning? [25.234544066205547]
We show in a simple theoretical setting, where positive pairs are generated by sampling from the underlying latent class, that the downstream performance of the representation does not degrade with the number of negative samples.
We also give a structural characterization of the optimal representation in our framework.
arXiv Detail & Related papers (2022-05-03T21:29:59Z) - Cross Pairwise Ranking for Unbiased Item Recommendation [57.71258289870123]
We develop a new learning paradigm named Cross Pairwise Ranking (CPR)
CPR achieves unbiased recommendation without knowing the exposure mechanism.
We prove in theory that this way offsets the influence of user/item propensity on the learning.
arXiv Detail & Related papers (2022-04-26T09:20:27Z) - Supervised Advantage Actor-Critic for Recommender Systems [76.7066594130961]
We propose negative sampling strategy for training the RL component and combine it with supervised sequential learning.
Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case.
We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets.
arXiv Detail & Related papers (2021-11-05T12:51:15Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Cram\'er-Rao bound-informed training of neural networks for quantitative
MRI [11.964144201247198]
Neural networks are increasingly used to estimate parameters in quantitative MRI, in particular in magnetic resonance fingerprinting.
Their advantages are their superior speed and their dominance of the non-efficient unbiased estimator.
We find, however, that heterogeneous parameters are hard to estimate.
We propose a well-founded Cram'erRao loss function, which normalizes the squared error with respective CRB.
arXiv Detail & Related papers (2021-09-22T06:38:03Z) - Oversampling Divide-and-conquer for Response-skewed Kernel Ridge
Regression [20.00435452480056]
We develop a novel response-adaptive partition strategy to overcome the limitation of the divide-and-conquer method.
We show the proposed estimate has a smaller mean squared error (AMSE) than that of the classical dacKRR estimate under mild conditions.
arXiv Detail & Related papers (2021-07-13T04:01:04Z) - Focal and Efficient IOU Loss for Accurate Bounding Box Regression [63.14659624634066]
In object detection, bounding box regression (BBR) is a crucial step that determines the object localization performance.
Most previous loss functions for BBR have two main drawbacks: (i) Both $ell_n$-norm and IOU-based loss functions are inefficient to depict the objective of BBR, which leads to slow convergence and inaccurate regression results.
arXiv Detail & Related papers (2021-01-20T14:33:58Z) - Unbiased Risk Estimators Can Mislead: A Case Study of Learning with
Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels.
We show that the quality of gradient estimation matters more in risk minimization.
We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.