A Case Study on Sampling Strategies for Evaluating Neural Sequential
Item Recommendation Models
- URL: http://arxiv.org/abs/2107.13045v1
- Date: Tue, 27 Jul 2021 19:06:03 GMT
- Title: A Case Study on Sampling Strategies for Evaluating Neural Sequential
Item Recommendation Models
- Authors: Alexander Dallmann, Daniel Zoller, Andreas Hotho
- Abstract summary: Two well-known strategies to sample negative items are uniform random sampling and sampling by popularity.
We re-evaluate current state-of-the-art sequential recommender models from the point of view.
We find that both sampling strategies can produce inconsistent rankings compared with the full ranking of the models.
- Score: 69.32128532935403
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: At the present time, sequential item recommendation models are compared by
calculating metrics on a small item subset (target set) to speed up
computation. The target set contains the relevant item and a set of negative
items that are sampled from the full item set. Two well-known strategies to
sample negative items are uniform random sampling and sampling by popularity to
better approximate the item frequency distribution in the dataset. Most
recently published papers on sequential item recommendation rely on sampling by
popularity to compare the evaluated models. However, recent work has already
shown that an evaluation with uniform random sampling may not be consistent
with the full ranking, that is, the model ranking obtained by evaluating a
metric using the full item set as target set, which raises the question whether
the ranking obtained by sampling by popularity is equal to the full ranking. In
this work, we re-evaluate current state-of-the-art sequential recommender
models from the point of view, whether these sampling strategies have an impact
on the final ranking of the models. We therefore train four recently proposed
sequential recommendation models on five widely known datasets. For each
dataset and model, we employ three evaluation strategies. First, we compute the
full model ranking. Then we evaluate all models on a target set sampled by the
two different sampling strategies, uniform random sampling and sampling by
popularity with the commonly used target set size of 100, compute the model
ranking for each strategy and compare them with each other. Additionally, we
vary the size of the sampled target set. Overall, we find that both sampling
strategies can produce inconsistent rankings compared with the full ranking of
the models. Furthermore, both sampling by popularity and uniform random
sampling do not consistently produce the same ranking ...
Related papers
- Evaluating Performance and Bias of Negative Sampling in Large-Scale Sequential Recommendation Models [0.0]
Large-scale industrial recommendation models predict the most relevant items from catalogs containing millions or billions of options.
To train these models efficiently, a small set of irrelevant items (negative samples) is selected from the vast catalog for each relevant item.
Our study serves as a practical guide to the trade-offs in selecting a negative sampling method for large-scale sequential recommendation models.
arXiv Detail & Related papers (2024-10-08T00:23:17Z) - Efficient Failure Pattern Identification of Predictive Algorithms [15.02620042972929]
We propose a human-machine collaborative framework that consists of a team of human annotators and a sequential recommendation algorithm.
The results empirically demonstrate the competitive performance of our framework on multiple datasets at various signal-to-noise ratios.
arXiv Detail & Related papers (2023-06-01T14:54:42Z) - BRIO: Bringing Order to Abstractive Summarization [107.97378285293507]
We propose a novel training paradigm which assumes a non-deterministic distribution.
Our method achieves a new state-of-the-art result on the CNN/DailyMail (47.78 ROUGE-1) and XSum (49.07 ROUGE-1) datasets.
arXiv Detail & Related papers (2022-03-31T05:19:38Z) - A Unified Statistical Learning Model for Rankings and Scores with
Application to Grant Panel Review [1.240096657086732]
Rankings and scores are two common data types used by judges to express preferences and/or perceptions of quality in a collection of objects.
Numerous models exist to study data of each type separately, but no unified statistical model captures both data types simultaneously.
We propose the Mallows-Binomial model to close this gap, which combines a Mallows' $phi$ ranking model with Binomial score models.
arXiv Detail & Related papers (2022-01-07T16:56:52Z) - Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise
Comparisons [85.5955376526419]
In rank aggregation problems, users exhibit various accuracy levels when comparing pairs of items.
We propose an elimination-based active sampling strategy, which estimates the ranking of items via noisy pairwise comparisons.
We prove that our algorithm can return the true ranking of items with high probability.
arXiv Detail & Related papers (2021-10-08T13:51:55Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Set2setRank: Collaborative Set to Set Ranking for Implicit Feedback
based Recommendation [59.183016033308014]
In this paper, we explore the unique characteristics of the implicit feedback and propose Set2setRank framework for recommendation.
Our proposed framework is model-agnostic and can be easily applied to most recommendation prediction approaches.
arXiv Detail & Related papers (2021-05-16T08:06:22Z) - One for More: Selecting Generalizable Samples for Generalizable ReID
Model [92.40951770273972]
This paper proposes a one-for-more training objective that takes the generalization ability of selected samples as a loss function.
Our proposed one-for-more based sampler can be seamlessly integrated into the ReID training framework.
arXiv Detail & Related papers (2020-12-10T06:37:09Z) - More Informed Random Sample Consensus [1.827510863075184]
We propose a method that samples data with a L'evy distribution together with a data sorting algorithm.
In the hypothesis sampling step of the proposed method, data is sorted with a sorting algorithm we proposed, which sorts data based on the likelihood of a data point being in the inlier set.
Then, hypotheses are sampled from the sorted data with L'evy distribution.
arXiv Detail & Related papers (2020-11-18T06:43:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.