A Theoretical Analysis of Recommendation Loss Functions under Negative Sampling
- URL: http://arxiv.org/abs/2411.07770v1
- Date: Tue, 12 Nov 2024 13:06:16 GMT
- Title: A Theoretical Analysis of Recommendation Loss Functions under Negative Sampling
- Authors: Giulia Di Teodoro, Federico Siciliano, Nicola Tonellotto, Fabrizio Silvestri,
- Abstract summary: This paper conducts a comparative analysis of prevalent loss functions in Recommender Systems (RSs)
We show that Binary Cross-Entropy (BCE), Categorical Cross-Entropy (CCE), and Bayesian Personalized Ranking (BPR) are equivalent when one negative sample is used.
- Score: 13.180345241212423
- License:
- Abstract: Recommender Systems (RSs) are pivotal in diverse domains such as e-commerce, music streaming, and social media. This paper conducts a comparative analysis of prevalent loss functions in RSs: Binary Cross-Entropy (BCE), Categorical Cross-Entropy (CCE), and Bayesian Personalized Ranking (BPR). Exploring the behaviour of these loss functions across varying negative sampling settings, we reveal that BPR and CCE are equivalent when one negative sample is used. Additionally, we demonstrate that all losses share a common global minimum. Evaluation of RSs mainly relies on ranking metrics known as Normalized Discounted Cumulative Gain (NDCG) and Mean Reciprocal Rank (MRR). We produce bounds of the different losses for negative sampling settings to establish a probabilistic lower bound for NDCG. We show that the BPR bound on NDCG is weaker than that of BCE, contradicting the common assumption that BPR is superior to BCE in RSs training. Experiments on five datasets and four models empirically support these theoretical findings. Our code is available at \url{https://anonymous.4open.science/r/recsys_losses} .
Related papers
- SimCE: Simplifying Cross-Entropy Loss for Collaborative Filtering [47.81610130269399]
We propose a Sampled Softmax Cross-Entropy (SSM) that compares one positive sample with multiple negative samples, leading to better performance.
We also introduce a underlineSimplified Sampled Softmax underlineCross-underlineEntropy Loss (SimCE) which simplifies the SSM using its upper bound.
Our validation on 12 benchmark datasets, using both MF and LightGCN backbones, shows that SimCE significantly outperforms both BPR and SSM.
arXiv Detail & Related papers (2024-06-23T17:24:07Z) - gSASRec: Reducing Overconfidence in Sequential Recommendation Trained
with Negative Sampling [67.71952251641545]
We show that models trained with negative sampling tend to overestimate the probabilities of positive interactions.
We propose a novel Generalised Binary Cross-Entropy Loss function (gBCE) and theoretically prove that it can mitigate overconfidence.
We show through detailed experiments on three datasets that gSASRec does not exhibit the overconfidence problem.
arXiv Detail & Related papers (2023-08-14T14:56:40Z) - On the Theories Behind Hard Negative Sampling for Recommendation [51.64626293229085]
We offer two insightful guidelines for effective usage of Hard Negative Sampling (HNS)
We prove that employing HNS on the Personalized Ranking (BPR) learner is equivalent to optimizing One-way Partial AUC (OPAUC)
These analyses establish the theoretical foundation of HNS in optimizing Top-K recommendation performance for the first time.
arXiv Detail & Related papers (2023-02-07T13:57:03Z) - Positive-Negative Equal Contrastive Loss for Semantic Segmentation [8.664491798389662]
Previous works commonly design plug-and-play modules and structural losses to effectively extract and aggregate the global context.
We propose Positive-Negative Equal contrastive loss (PNE loss), which increases the latent impact of positive embedding on the anchor and treats the positive as well as negative sample pairs equally.
We conduct comprehensive experiments and achieve state-of-the-art performance on two benchmark datasets.
arXiv Detail & Related papers (2022-07-04T13:51:29Z) - Comprehensive Analysis of Negative Sampling in Knowledge Graph
Representation Learning [25.664174172917345]
Negative sampling (NS) loss plays an important role in learning knowledge graph embedding (KGE) to handle a huge number of entities.
We theoretically analyzed NS loss to assist hyper parameter tuning and understand the better use of the NS loss in KGE learning.
Our empirical analysis on the FB15k-237, WN18RR, and YAGO3-10 datasets showed that the results of actually trained models agree with our theoretical findings.
arXiv Detail & Related papers (2022-06-21T06:51:33Z) - Do More Negative Samples Necessarily Hurt in Contrastive Learning? [25.234544066205547]
We show in a simple theoretical setting, where positive pairs are generated by sampling from the underlying latent class, that the downstream performance of the representation does not degrade with the number of negative samples.
We also give a structural characterization of the optimal representation in our framework.
arXiv Detail & Related papers (2022-05-03T21:29:59Z) - Cross Pairwise Ranking for Unbiased Item Recommendation [57.71258289870123]
We develop a new learning paradigm named Cross Pairwise Ranking (CPR)
CPR achieves unbiased recommendation without knowing the exposure mechanism.
We prove in theory that this way offsets the influence of user/item propensity on the learning.
arXiv Detail & Related papers (2022-04-26T09:20:27Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Contrastive Attraction and Contrastive Repulsion for Representation
Learning [131.72147978462348]
Contrastive learning (CL) methods learn data representations in a self-supervision manner, where the encoder contrasts each positive sample over multiple negative samples.
Recent CL methods have achieved promising results when pretrained on large-scale datasets, such as ImageNet.
We propose a doubly CL strategy that separately compares positive and negative samples within their own groups, and then proceeds with a contrast between positive and negative groups.
arXiv Detail & Related papers (2021-05-08T17:25:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.