A Theoretical Analysis of Recommendation Loss Functions under Negative Sampling
- URL: http://arxiv.org/abs/2411.07770v2
- Date: Tue, 04 Feb 2025 21:30:16 GMT
- Title: A Theoretical Analysis of Recommendation Loss Functions under Negative Sampling
- Authors: Giulia Di Teodoro, Federico Siciliano, Nicola Tonellotto, Fabrizio Silvestri,
- Abstract summary: Loss functions like Categorical Cross Entropy (CCE), Binary Cross Entropy (BCE), and Bayesian Personalized Ranking (BPR) are commonly used in Recommender Systems (RSs) to differentiate positive items - those interacted with by users - and negative items.
We show that CCE offers the tightest lower bound on ranking metrics like Normalized Discounted Cumulative Gain (NDCG) and Mean Reciprocal Rank (MRR)
Under negative sampling, we reveal that BPR and CCE are equivalent when a single negative sample is drawn, and all three losses converge to the same global minimum.
- Score: 13.180345241212423
- License:
- Abstract: Loss functions like Categorical Cross Entropy (CCE), Binary Cross Entropy (BCE), and Bayesian Personalized Ranking (BPR) are commonly used in training Recommender Systems (RSs) to differentiate positive items - those interacted with by users - and negative items. While prior works empirically showed that CCE outperforms BCE and BPR when using the full set of negative items, we provide a theoretical explanation for this by proving that CCE offers the tightest lower bound on ranking metrics like Normalized Discounted Cumulative Gain (NDCG) and Mean Reciprocal Rank (MRR), followed by BPR and BCE. However, using the full set of negative items is computationally infeasible for large-scale RSs, prompting the use of negative sampling techniques. Under negative sampling, we reveal that BPR and CCE are equivalent when a single negative sample is drawn, and all three losses converge to the same global minimum. We further demonstrate that the sampled losses remain lower bounds for NDCG (MRR), albeit in a probabilistic sense. Our worst-case analysis shows that BCE offers the strongest bound on NDCG (MRR). Experiments on five datasets and four models empirically support these theoretical findings. Our code is available at https://anonymous.4open.science/r/recsys_losses .
Related papers
- SimCE: Simplifying Cross-Entropy Loss for Collaborative Filtering [47.81610130269399]
We propose a Sampled Softmax Cross-Entropy (SSM) that compares one positive sample with multiple negative samples, leading to better performance.
We also introduce a underlineSimplified Sampled Softmax underlineCross-underlineEntropy Loss (SimCE) which simplifies the SSM using its upper bound.
Our validation on 12 benchmark datasets, using both MF and LightGCN backbones, shows that SimCE significantly outperforms both BPR and SSM.
arXiv Detail & Related papers (2024-06-23T17:24:07Z) - gSASRec: Reducing Overconfidence in Sequential Recommendation Trained
with Negative Sampling [67.71952251641545]
We show that models trained with negative sampling tend to overestimate the probabilities of positive interactions.
We propose a novel Generalised Binary Cross-Entropy Loss function (gBCE) and theoretically prove that it can mitigate overconfidence.
We show through detailed experiments on three datasets that gSASRec does not exhibit the overconfidence problem.
arXiv Detail & Related papers (2023-08-14T14:56:40Z) - On the Theories Behind Hard Negative Sampling for Recommendation [51.64626293229085]
We offer two insightful guidelines for effective usage of Hard Negative Sampling (HNS)
We prove that employing HNS on the Personalized Ranking (BPR) learner is equivalent to optimizing One-way Partial AUC (OPAUC)
These analyses establish the theoretical foundation of HNS in optimizing Top-K recommendation performance for the first time.
arXiv Detail & Related papers (2023-02-07T13:57:03Z) - Positive-Negative Equal Contrastive Loss for Semantic Segmentation [8.664491798389662]
Previous works commonly design plug-and-play modules and structural losses to effectively extract and aggregate the global context.
We propose Positive-Negative Equal contrastive loss (PNE loss), which increases the latent impact of positive embedding on the anchor and treats the positive as well as negative sample pairs equally.
We conduct comprehensive experiments and achieve state-of-the-art performance on two benchmark datasets.
arXiv Detail & Related papers (2022-07-04T13:51:29Z) - Comprehensive Analysis of Negative Sampling in Knowledge Graph
Representation Learning [25.664174172917345]
Negative sampling (NS) loss plays an important role in learning knowledge graph embedding (KGE) to handle a huge number of entities.
We theoretically analyzed NS loss to assist hyper parameter tuning and understand the better use of the NS loss in KGE learning.
Our empirical analysis on the FB15k-237, WN18RR, and YAGO3-10 datasets showed that the results of actually trained models agree with our theoretical findings.
arXiv Detail & Related papers (2022-06-21T06:51:33Z) - Do More Negative Samples Necessarily Hurt in Contrastive Learning? [25.234544066205547]
We show in a simple theoretical setting, where positive pairs are generated by sampling from the underlying latent class, that the downstream performance of the representation does not degrade with the number of negative samples.
We also give a structural characterization of the optimal representation in our framework.
arXiv Detail & Related papers (2022-05-03T21:29:59Z) - Cross Pairwise Ranking for Unbiased Item Recommendation [57.71258289870123]
We develop a new learning paradigm named Cross Pairwise Ranking (CPR)
CPR achieves unbiased recommendation without knowing the exposure mechanism.
We prove in theory that this way offsets the influence of user/item propensity on the learning.
arXiv Detail & Related papers (2022-04-26T09:20:27Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Contrastive Attraction and Contrastive Repulsion for Representation
Learning [131.72147978462348]
Contrastive learning (CL) methods learn data representations in a self-supervision manner, where the encoder contrasts each positive sample over multiple negative samples.
Recent CL methods have achieved promising results when pretrained on large-scale datasets, such as ImageNet.
We propose a doubly CL strategy that separately compares positive and negative samples within their own groups, and then proceeds with a contrast between positive and negative groups.
arXiv Detail & Related papers (2021-05-08T17:25:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.