Related papers: A Theoretical Analysis of Recommendation Loss Functions under Negative Sampling

A Theoretical Analysis of Recommendation Loss Functions under Negative Sampling

URL: http://arxiv.org/abs/2411.07770v2
Date: Tue, 04 Feb 2025 21:30:16 GMT
Title: A Theoretical Analysis of Recommendation Loss Functions under Negative Sampling
Authors: Giulia Di Teodoro, Federico Siciliano, Nicola Tonellotto, Fabrizio Silvestri,
Abstract summary: Loss functions like Categorical Cross Entropy (CCE), Binary Cross Entropy (BCE), and Bayesian Personalized Ranking (BPR) are commonly used in Recommender Systems (RSs) to differentiate positive items - those interacted with by users - and negative items.<n>We show that CCE offers the tightest lower bound on ranking metrics like Normalized Discounted Cumulative Gain (NDCG) and Mean Reciprocal Rank (MRR)<n>Under negative sampling, we reveal that BPR and CCE are equivalent when a single negative sample is drawn, and all three losses converge to the same global minimum.
Score: 13.180345241212423
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Loss functions like Categorical Cross Entropy (CCE), Binary Cross Entropy (BCE), and Bayesian Personalized Ranking (BPR) are commonly used in training Recommender Systems (RSs) to differentiate positive items - those interacted with by users - and negative items. While prior works empirically showed that CCE outperforms BCE and BPR when using the full set of negative items, we provide a theoretical explanation for this by proving that CCE offers the tightest lower bound on ranking metrics like Normalized Discounted Cumulative Gain (NDCG) and Mean Reciprocal Rank (MRR), followed by BPR and BCE. However, using the full set of negative items is computationally infeasible for large-scale RSs, prompting the use of negative sampling techniques. Under negative sampling, we reveal that BPR and CCE are equivalent when a single negative sample is drawn, and all three losses converge to the same global minimum. We further demonstrate that the sampled losses remain lower bounds for NDCG (MRR), albeit in a probabilistic sense. Our worst-case analysis shows that BCE offers the strongest bound on NDCG (MRR). Experiments on five datasets and four models empirically support these theoretical findings. Our code is available at https://anonymous.4open.science/r/recsys_losses .

Related papers

SimCE: Simplifying Cross-Entropy Loss for Collaborative Filtering [47.81610130269399]
We propose a Sampled Softmax Cross-Entropy (SSM) that compares one positive sample with multiple negative samples, leading to better performance. We also introduce a underlineSimplified Sampled Softmax underlineCross-underlineEntropy Loss (SimCE) which simplifies the SSM using its upper bound. Our validation on 12 benchmark datasets, using both MF and LightGCN backbones, shows that SimCE significantly outperforms both BPR and SSM.
arXiv Detail & Related papers (2024-06-23T17:24:07Z)
gSASRec: Reducing Overconfidence in Sequential Recommendation Trained with Negative Sampling [67.71952251641545]
We show that models trained with negative sampling tend to overestimate the probabilities of positive interactions. We propose a novel Generalised Binary Cross-Entropy Loss function (gBCE) and theoretically prove that it can mitigate overconfidence. We show through detailed experiments on three datasets that gSASRec does not exhibit the overconfidence problem.
arXiv Detail & Related papers (2023-08-14T14:56:40Z)
On the Theories Behind Hard Negative Sampling for Recommendation [51.64626293229085]
We offer two insightful guidelines for effective usage of Hard Negative Sampling (HNS) We prove that employing HNS on the Personalized Ranking (BPR) learner is equivalent to optimizing One-way Partial AUC (OPAUC) These analyses establish the theoretical foundation of HNS in optimizing Top-K recommendation performance for the first time.
arXiv Detail & Related papers (2023-02-07T13:57:03Z)
Positive-Negative Equal Contrastive Loss for Semantic Segmentation [8.664491798389662]
Previous works commonly design plug-and-play modules and structural losses to effectively extract and aggregate the global context. We propose Positive-Negative Equal contrastive loss (PNE loss), which increases the latent impact of positive embedding on the anchor and treats the positive as well as negative sample pairs equally. We conduct comprehensive experiments and achieve state-of-the-art performance on two benchmark datasets.
arXiv Detail & Related papers (2022-07-04T13:51:29Z)
Comprehensive Analysis of Negative Sampling in Knowledge Graph Representation Learning [25.664174172917345]
Negative sampling (NS) loss plays an important role in learning knowledge graph embedding (KGE) to handle a huge number of entities. We theoretically analyzed NS loss to assist hyper parameter tuning and understand the better use of the NS loss in KGE learning. Our empirical analysis on the FB15k-237, WN18RR, and YAGO3-10 datasets showed that the results of actually trained models agree with our theoretical findings.
arXiv Detail & Related papers (2022-06-21T06:51:33Z)
Do More Negative Samples Necessarily Hurt in Contrastive Learning? [25.234544066205547]
We show in a simple theoretical setting, where positive pairs are generated by sampling from the underlying latent class, that the downstream performance of the representation does not degrade with the number of negative samples. We also give a structural characterization of the optimal representation in our framework.
arXiv Detail & Related papers (2022-05-03T21:29:59Z)
Cross Pairwise Ranking for Unbiased Item Recommendation [57.71258289870123]
We develop a new learning paradigm named Cross Pairwise Ranking (CPR) CPR achieves unbiased recommendation without knowing the exposure mechanism. We prove in theory that this way offsets the influence of user/item propensity on the learning.
arXiv Detail & Related papers (2022-04-26T09:20:27Z)
Supervised Advantage Actor-Critic for Recommender Systems [76.7066594130961]
We propose negative sampling strategy for training the RL component and combine it with supervised sequential learning. Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case. We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets.
arXiv Detail & Related papers (2021-11-05T12:51:15Z)
Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models. In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints. A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z)
Cram\'er-Rao bound-informed training of neural networks for quantitative MRI [11.964144201247198]
Neural networks are increasingly used to estimate parameters in quantitative MRI, in particular in magnetic resonance fingerprinting. Their advantages are their superior speed and their dominance of the non-efficient unbiased estimator. We find, however, that heterogeneous parameters are hard to estimate. We propose a well-founded Cram'erRao loss function, which normalizes the squared error with respective CRB.
arXiv Detail & Related papers (2021-09-22T06:38:03Z)
Oversampling Divide-and-conquer for Response-skewed Kernel Ridge Regression [20.00435452480056]
We develop a novel response-adaptive partition strategy to overcome the limitation of the divide-and-conquer method. We show the proposed estimate has a smaller mean squared error (AMSE) than that of the classical dacKRR estimate under mild conditions.
arXiv Detail & Related papers (2021-07-13T04:01:04Z)
Contrastive Attraction and Contrastive Repulsion for Representation Learning [131.72147978462348]
Contrastive learning (CL) methods learn data representations in a self-supervision manner, where the encoder contrasts each positive sample over multiple negative samples. Recent CL methods have achieved promising results when pretrained on large-scale datasets, such as ImageNet. We propose a doubly CL strategy that separately compares positive and negative samples within their own groups, and then proceeds with a contrast between positive and negative groups.
arXiv Detail & Related papers (2021-05-08T17:25:08Z)
Focal and Efficient IOU Loss for Accurate Bounding Box Regression [63.14659624634066]
In object detection, bounding box regression (BBR) is a crucial step that determines the object localization performance. Most previous loss functions for BBR have two main drawbacks: (i) Both $ell_n$-norm and IOU-based loss functions are inefficient to depict the objective of BBR, which leads to slow convergence and inaccurate regression results.
arXiv Detail & Related papers (2021-01-20T14:33:58Z)
Unbiased Risk Estimators Can Mislead: A Case Study of Learning with Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels. We show that the quality of gradient estimation matters more in risk minimization. We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.