Rethinking Collaborative Metric Learning: Toward an Efficient
Alternative without Negative Sampling
- URL: http://arxiv.org/abs/2206.11549v1
- Date: Thu, 23 Jun 2022 08:50:22 GMT
- Title: Rethinking Collaborative Metric Learning: Toward an Efficient
Alternative without Negative Sampling
- Authors: Shilong Bao, Qianqian Xu, Zhiyong Yang, Xiaochun Cao, Qingming Huang
- Abstract summary: Collaborative Metric Learning (CML) paradigm has aroused wide interest in the area of recommendation systems (RS)
We find that negative sampling would lead to a biased estimation of the generalization error.
Motivated by this, we propose an efficient alternative without negative sampling for CML named textitSampling-Free Collaborative Metric Learning (SFCML)
- Score: 156.7248383178991
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recently proposed Collaborative Metric Learning (CML) paradigm has
aroused wide interest in the area of recommendation systems (RS) owing to its
simplicity and effectiveness. Typically, the existing literature of CML depends
largely on the \textit{negative sampling} strategy to alleviate the
time-consuming burden of pairwise computation. However, in this work, by taking
a theoretical analysis, we find that negative sampling would lead to a biased
estimation of the generalization error. Specifically, we show that the
sampling-based CML would introduce a bias term in the generalization bound,
which is quantified by the per-user \textit{Total Variance} (TV) between the
distribution induced by negative sampling and the ground truth distribution.
This suggests that optimizing the sampling-based CML loss function does not
ensure a small generalization error even with sufficiently large training data.
Moreover, we show that the bias term will vanish without the negative sampling
strategy. Motivated by this, we propose an efficient alternative without
negative sampling for CML named \textit{Sampling-Free Collaborative Metric
Learning} (SFCML), to get rid of the sampling bias in a practical sense.
Finally, comprehensive experiments over seven benchmark datasets speak to the
superiority of the proposed algorithm.
Related papers
- Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods [59.779795063072655]
Chain-of-Thought (CoT) prompting and its variants have gained popularity as effective methods for solving multi-step reasoning problems.
We analyze CoT prompting from a statistical estimation perspective, providing a comprehensive characterization of its sample complexity.
arXiv Detail & Related papers (2024-08-25T04:07:18Z) - Test-Time Distribution Normalization for Contrastively Learned
Vision-language Models [39.66329310098645]
One of the most representative approaches proposed recently known as CLIP has garnered widespread adoption due to its effectiveness.
This paper reveals that the common downstream practice of taking a dot product is only a zeroth-order approximation of the optimization goal, resulting in a loss of information during test-time.
We propose Distribution Normalization (DN), where we approximate the mean representation of a batch of test samples and use such a mean to represent what would be analogous to negative samples in the InfoNCE loss.
arXiv Detail & Related papers (2023-02-22T01:14:30Z) - Information Theoretical Importance Sampling Clustering [18.248246885248733]
A current assumption of most clustering methods is that the training data and future data are taken from the same distribution.
We propose an information theoretical importance sampling based approach for clustering problems (ITISC)
Experiment results on synthetic datasets and a real-world load forecasting problem validate the effectiveness of the proposed model.
arXiv Detail & Related papers (2023-02-09T03:18:53Z) - Asymptotically Unbiased Instance-wise Regularized Partial AUC
Optimization: Theory and Algorithm [101.44676036551537]
One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC) measures the average performance of a binary classifier.
Most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable.
We present a simpler reformulation of the PAUC problem via distributional robust optimization AUC.
arXiv Detail & Related papers (2022-10-08T08:26:22Z) - Hard Negative Sampling via Regularized Optimal Transport for Contrastive
Representation Learning [13.474603286270836]
We study the problem of designing hard negative sampling distributions for unsupervised contrastive representation learning.
We propose and analyze a novel min-max framework that seeks a representation which minimizes the maximum (worst-case) generalized contrastive learning loss.
arXiv Detail & Related papers (2021-11-04T21:25:24Z) - Nonuniform Negative Sampling and Log Odds Correction with Rare Events
Data [15.696653979226113]
We investigate the issue of parameter estimation with nonuniform negative sampling for imbalanced data.
We derive a general inverse probability weighted (IPW) estimator and obtain the optimal sampling probability that minimizes its variance.
Both theoretical and empirical results demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2021-10-25T15:37:22Z) - Unrolling Particles: Unsupervised Learning of Sampling Distributions [102.72972137287728]
Particle filtering is used to compute good nonlinear estimates of complex systems.
We show in simulations that the resulting particle filter yields good estimates in a wide range of scenarios.
arXiv Detail & Related papers (2021-10-06T16:58:34Z) - Rethinking InfoNCE: How Many Negative Samples Do You Need? [54.146208195806636]
We study how many negative samples are optimal for InfoNCE in different scenarios via a semi-quantitative theoretical framework.
We estimate the optimal negative sampling ratio using the $K$ value that maximizes the training effectiveness function.
arXiv Detail & Related papers (2021-05-27T08:38:29Z) - Simplify and Robustify Negative Sampling for Implicit Collaborative
Filtering [42.832851785261894]
In this paper, we first provide a novel understanding of negative instances by empirically observing that only a few instances are potentially important for model learning.
We then tackle the untouched false negative problem by favouring high-variance samples stored in memory.
Empirical results on two synthetic datasets and three real-world datasets demonstrate both robustness and superiorities of our negative sampling method.
arXiv Detail & Related papers (2020-09-07T19:08:26Z) - Understanding Negative Sampling in Graph Representation Learning [87.35038268508414]
We show that negative sampling is as important as positive sampling in determining the optimization objective and the resulted variance.
We propose Metropolis-Hastings (MCNS) to approximate the positive distribution with self-contrast approximation and accelerate negative sampling by Metropolis-Hastings.
We evaluate our method on 5 datasets that cover extensive downstream graph learning tasks, including link prediction, node classification and personalized recommendation.
arXiv Detail & Related papers (2020-05-20T06:25:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.