Sampler Design for Implicit Feedback Data by Noisy-label Robust Learning
- URL: http://arxiv.org/abs/2007.07204v1
- Date: Sun, 28 Jun 2020 05:31:53 GMT
- Title: Sampler Design for Implicit Feedback Data by Noisy-label Robust Learning
- Authors: Wenhui Yu and Zheng Qin
- Abstract summary: We design an adaptive sampler based on noisy-label robust learning for implicit feedback data.
We predict users' preferences with the model and learn it by maximizing likelihood of observed data labels.
We then consider the risk of these noisy labels, and propose a Noisy-label Robust BPO.
- Score: 32.76804332450971
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Implicit feedback data is extensively explored in recommendation as it is
easy to collect and generally applicable. However, predicting users' preference
on implicit feedback data is a challenging task since we can only observe
positive (voted) samples and unvoted samples. It is difficult to distinguish
between the negative samples and unlabeled positive samples from the unvoted
ones. Existing works, such as Bayesian Personalized Ranking (BPR), sample
unvoted items as negative samples uniformly, therefore suffer from a critical
noisy-label issue. To address this gap, we design an adaptive sampler based on
noisy-label robust learning for implicit feedback data.
To formulate the issue, we first introduce Bayesian Point-wise Optimization
(BPO) to learn a model, e.g., Matrix Factorization (MF), by maximum likelihood
estimation. We predict users' preferences with the model and learn it by
maximizing likelihood of observed data labels, i.e., a user prefers her
positive samples and has no interests in her unvoted samples. However, in
reality, a user may have interests in some of her unvoted samples, which are
indeed positive samples mislabeled as negative ones. We then consider the risk
of these noisy labels, and propose a Noisy-label Robust BPO (NBPO). NBPO also
maximizes the observation likelihood while connects users' preference and
observed labels by the likelihood of label flipping based on the Bayes'
theorem. In NBPO, a user prefers her true positive samples and shows no
interests in her true negative samples, hence the optimization quality is
dramatically improved. Extensive experiments on two public real-world datasets
show the significant improvement of our proposed optimization methods.
Related papers
- Double Correction Framework for Denoising Recommendation [45.98207284259792]
In implicit feedback, noisy samples can affect precise user preference learning.
A popular solution is based on dropping noisy samples in the model training phase.
We propose a Double Correction Framework for Denoising Recommendation.
arXiv Detail & Related papers (2024-05-18T12:15:10Z) - Learning with Imbalanced Noisy Data by Preventing Bias in Sample
Selection [82.43311784594384]
Real-world datasets contain not only noisy labels but also class imbalance.
We propose a simple yet effective method to address noisy labels in imbalanced datasets.
arXiv Detail & Related papers (2024-02-17T10:34:53Z) - Data Pruning via Moving-one-Sample-out [61.45441981346064]
We propose a novel data-pruning approach called moving-one-sample-out (MoSo)
MoSo aims to identify and remove the least informative samples from the training set.
Experimental results demonstrate that MoSo effectively mitigates severe performance degradation at high pruning ratios.
arXiv Detail & Related papers (2023-10-23T08:00:03Z) - Better Sampling of Negatives for Distantly Supervised Named Entity
Recognition [39.264878763160766]
We propose a simple and straightforward approach for selecting the top negative samples that have high similarities with all the positive samples for training.
Our method achieves consistent performance improvements on four distantly supervised NER datasets.
arXiv Detail & Related papers (2023-05-22T15:35:39Z) - Bayesian Self-Supervised Contrastive Learning [16.903874675729952]
This paper proposes a new self-supervised contrastive loss called the BCL loss.
The key idea is to design the desired sampling distribution for sampling hard true negative samples under the Bayesian framework.
Experiments validate the effectiveness and superiority of the BCL loss.
arXiv Detail & Related papers (2023-01-27T12:13:06Z) - Learning with Noisy Labels over Imbalanced Subpopulations [13.477553187049462]
Learning with noisy labels (LNL) has attracted significant attention from the research community.
We propose a novel LNL method to simultaneously deal with noisy labels and imbalanced subpopulations.
We introduce a feature-based metric that takes the sample correlation into account for estimating samples' clean probabilities.
arXiv Detail & Related papers (2022-11-16T07:25:24Z) - Generating Negative Samples for Sequential Recommendation [83.60655196391855]
We propose to Generate Negative Samples (items) for Sequential Recommendation (SR)
A negative item is sampled at each time step based on the current SR model's learned user preferences toward items.
Experiments on four public datasets verify the importance of providing high-quality negative samples for SR.
arXiv Detail & Related papers (2022-08-07T05:44:13Z) - FedCL: Federated Contrastive Learning for Privacy-Preserving
Recommendation [98.5705258907774]
FedCL can exploit high-quality negative samples for effective model training with privacy well protected.
We first infer user embeddings from local user data through the local model on each client, and then perturb them with local differential privacy (LDP)
Since individual user embedding contains heavy noise due to LDP, we propose to cluster user embeddings on the server to mitigate the influence of noise.
arXiv Detail & Related papers (2022-04-21T02:37:10Z) - Rethinking InfoNCE: How Many Negative Samples Do You Need? [54.146208195806636]
We study how many negative samples are optimal for InfoNCE in different scenarios via a semi-quantitative theoretical framework.
We estimate the optimal negative sampling ratio using the $K$ value that maximizes the training effectiveness function.
arXiv Detail & Related papers (2021-05-27T08:38:29Z) - Probabilistic and Variational Recommendation Denoising [56.879165033014026]
Learning from implicit feedback is one of the most common cases in the application of recommender systems.
We propose probabilistic and variational recommendation denoising for implicit feedback.
We employ the proposed DPI and DVAE on four state-of-the-art recommendation models and conduct experiments on three datasets.
arXiv Detail & Related papers (2021-05-20T08:59:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.