Related papers: Evaluating Performance and Bias of Negative Sampling in Large-Scale Sequential Recommendation Models

Evaluating Performance and Bias of Negative Sampling in Large-Scale Sequential Recommendation Models

URL: http://arxiv.org/abs/2410.17276v2
Date: Tue, 29 Oct 2024 04:32:17 GMT
Title: Evaluating Performance and Bias of Negative Sampling in Large-Scale Sequential Recommendation Models
Authors: Arushi Prakash, Dimitrios Bermperidis, Srivas Chennu,
Abstract summary: Large-scale industrial recommendation models predict the most relevant items from catalogs containing millions or billions of options. To train these models efficiently, a small set of irrelevant items (negative samples) is selected from the vast catalog for each relevant item. Our study serves as a practical guide to the trade-offs in selecting a negative sampling method for large-scale sequential recommendation models.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large-scale industrial recommendation models predict the most relevant items from catalogs containing millions or billions of options. To train these models efficiently, a small set of irrelevant items (negative samples) is selected from the vast catalog for each relevant item (positive example), helping the model distinguish between relevant and irrelevant items. Choosing the right negative sampling method is a common challenge. We address this by implementing and comparing various negative sampling methods - random, popularity-based, in-batch, mixed, adaptive, and adaptive with mixed variants - on modern sequential recommendation models. Our experiments, including hyperparameter optimization and 20x repeats on three benchmark datasets with varying popularity biases, show how the choice of method and dataset characteristics impact key model performance metrics. We also reveal that average performance metrics often hide imbalances across popularity bands (head, mid, tail). We find that commonly used random negative sampling reinforces popularity bias and performs best for head items. Popularity-based methods (in-batch and global popularity negative sampling) can offer balanced performance at the cost of lower overall model performance results. Our study serves as a practical guide to the trade-offs in selecting a negative sampling method for large-scale sequential recommendation models. Code, datasets, experimental results and hyperparameters are available at: https://github.com/apple/ml-negative-sampling.

Related papers

How to Select Datapoints for Efficient Human Evaluation of NLG Models? [57.60407340254572]
We develop a suite of selectors to get the most informative datapoints for human evaluation. We show that selectors based on variance in automated metric scores, diversity in model outputs, or Item Response Theory outperform random selection. In particular, we introduce source-based estimators, which predict item usefulness for human evaluation just based on the source texts.
arXiv Detail & Related papers (2025-01-30T10:33:26Z)
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback [87.37721254914476]
We introduce a routing framework that combines inputs from humans and LMs to achieve better annotation quality. We train a performance prediction model to predict a reward model's performance on an arbitrary combination of human and LM annotations. We show that the selected hybrid mixture achieves better reward model performance compared to using either one exclusively.
arXiv Detail & Related papers (2024-10-24T20:04:15Z)
SCONE: A Novel Stochastic Sampling to Generate Contrastive Views and Hard Negative Samples for Recommendation [28.886714896469737]
Graph-based collaborative filtering (CF) has emerged as a promising approach in recommender systems. Despite its achievements, graph-based CF models face challenges due to data sparsity and negative sampling. In this paper, we propose a novel sampling for i) COntrastive views and ii) hard NEgative samples (SCONE) to overcome these issues.
arXiv Detail & Related papers (2024-05-01T02:27:59Z)
Out-of-sample scoring and automatic selection of causal estimators [0.0]
We propose novel scoring approaches for both the CATE case and an important subset of instrumental variable problems. We implement that in an open source package that relies on DoWhy and EconML libraries.
arXiv Detail & Related papers (2022-12-20T08:29:18Z)
Generating Negative Samples for Sequential Recommendation [83.60655196391855]
We propose to Generate Negative Samples (items) for Sequential Recommendation (SR) A negative item is sampled at each time step based on the current SR model's learned user preferences toward items. Experiments on four public datasets verify the importance of providing high-quality negative samples for SR.
arXiv Detail & Related papers (2022-08-07T05:44:13Z)
A Case Study on Sampling Strategies for Evaluating Neural Sequential Item Recommendation Models [69.32128532935403]
Two well-known strategies to sample negative items are uniform random sampling and sampling by popularity. We re-evaluate current state-of-the-art sequential recommender models from the point of view. We find that both sampling strategies can produce inconsistent rankings compared with the full ranking of the models.
arXiv Detail & Related papers (2021-07-27T19:06:03Z)
Rethinking InfoNCE: How Many Negative Samples Do You Need? [54.146208195806636]
We study how many negative samples are optimal for InfoNCE in different scenarios via a semi-quantitative theoretical framework. We estimate the optimal negative sampling ratio using the $K$ value that maximizes the training effectiveness function.
arXiv Detail & Related papers (2021-05-27T08:38:29Z)
Set2setRank: Collaborative Set to Set Ranking for Implicit Feedback based Recommendation [59.183016033308014]
In this paper, we explore the unique characteristics of the implicit feedback and propose Set2setRank framework for recommendation. Our proposed framework is model-agnostic and can be easily applied to most recommendation prediction approaches.
arXiv Detail & Related papers (2021-05-16T08:06:22Z)
Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem. Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem. We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z)
Addressing Class-Imbalance Problem in Personalized Ranking [47.11372043636176]
We propose an efficient emphunderlineVital underlineNegative underlineSampler (VINS) to alleviate the class-imbalance issue for pairwise ranking model. VINS is a bias sampler with reject probability that will tend to accept a negative candidate with a larger degree weight than the given positive item.
arXiv Detail & Related papers (2020-05-19T08:11:26Z)
Reinforced Negative Sampling over Knowledge Graph for Recommendation [106.07209348727564]
We develop a new negative sampling model, Knowledge Graph Policy Network (kgPolicy), which works as a reinforcement learning agent to explore high-quality negatives. kgPolicy navigates from the target positive interaction, adaptively receives knowledge-aware negative signals, and ultimately yields a potential negative item to train the recommender.
arXiv Detail & Related papers (2020-03-12T12:44:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.