Rethinking Neural vs. Matrix-Factorization Collaborative Filtering: the
Theoretical Perspectives
- URL: http://arxiv.org/abs/2110.12141v1
- Date: Sat, 23 Oct 2021 04:55:21 GMT
- Title: Rethinking Neural vs. Matrix-Factorization Collaborative Filtering: the
Theoretical Perspectives
- Authors: Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, Kannan Achan
- Abstract summary: Recent work argues that matrix-factorization collaborative filtering (MCF) compares favorably to neural collaborative filtering (NCF)
In this paper, we address the comparison rigorously by answering the following questions.
- Score: 18.204325860752768
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent work by Rendle et al. (2020), based on empirical observations,
argues that matrix-factorization collaborative filtering (MCF) compares
favorably to neural collaborative filtering (NCF), and conjectures the dot
product's superiority over the feed-forward neural network as similarity
function. In this paper, we address the comparison rigorously by answering the
following questions: 1. what is the limiting expressivity of each model; 2.
under the practical gradient descent, to which solution does each optimization
path converge; 3. how would the models generalize under the inductive and
transductive learning setting. Our results highlight the similar expressivity
for the overparameterized NCF and MCF as kernelized predictors, and reveal the
relation between their optimization paths. We further show their different
generalization behaviors, where MCF and NCF experience specific tradeoff and
comparison in the transductive and inductive collaborative filtering setting.
Lastly, by showing a novel generalization result, we reveal the critical role
of correcting exposure bias for model evaluation in the inductive setting. Our
results explain some of the previously observed conflicts, and we provide
synthetic and real-data experiments to shed further insights to this topic.
Related papers
- An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models [32.04194224236952]
We propose an information-theoretic objective function called Sparse Rate Reduction (SRR)
We show that SRR has a positive correlation coefficient and outperforms other baseline measures, such as path-norm and sharpness-based ones.
We show that generalization can be improved using SRR as regularization on benchmark image classification datasets.
arXiv Detail & Related papers (2024-11-26T07:44:57Z) - Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases [76.9127853906115]
Bridging the gap between diffusion models and human preferences is crucial for their integration into practical generative.
We propose Temporal Diffusion Policy Optimization with critic active neuron Reset (TDPO-R), a policy gradient algorithm that exploits the temporal inductive bias of diffusion models.
Empirical results demonstrate the superior efficacy of our methods in mitigating reward overoptimization.
arXiv Detail & Related papers (2024-02-13T15:55:41Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - CausalLM is not optimal for in-context learning [21.591451511589693]
Recent empirical evidence indicates that transformer based in-context learning performs better when using a prefix language model (LM)
While this result is intuitive, it is not understood from a theoretical perspective.
We take a theoretical approach and analyze the convergence behavior of prefixLM and causalLM under a certain parameter construction.
arXiv Detail & Related papers (2023-08-14T03:14:38Z) - Optimal regularizations for data generation with probabilistic graphical
models [0.0]
Empirically, well-chosen regularization schemes dramatically improve the quality of the inferred models.
We consider the particular case of L 2 and L 1 regularizations in the Maximum A Posteriori (MAP) inference of generative pairwise graphical models.
arXiv Detail & Related papers (2021-12-02T14:45:16Z) - Reenvisioning Collaborative Filtering vs Matrix Factorization [65.74881520196762]
Collaborative filtering models based on matrix factorization and learned similarities using Artificial Neural Networks (ANNs) have gained significant attention in recent years.
Announcement of ANNs within the recommendation ecosystem has been recently questioned, raising several comparisons in terms of efficiency and effectiveness.
We show the potential these techniques may have on beyond-accuracy evaluation while analyzing effect on complementary evaluation dimensions.
arXiv Detail & Related papers (2021-07-28T16:29:38Z) - Loss function based second-order Jensen inequality and its application
to particle variational inference [112.58907653042317]
Particle variational inference (PVI) uses an ensemble of models as an empirical approximation for the posterior distribution.
PVI iteratively updates each model with a repulsion force to ensure the diversity of the optimized models.
We derive a novel generalization error bound and show that it can be reduced by enhancing the diversity of models.
arXiv Detail & Related papers (2021-06-09T12:13:51Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Binary Classification of Gaussian Mixtures: Abundance of Support
Vectors, Benign Overfitting and Regularization [39.35822033674126]
We study binary linear classification under a generative Gaussian mixture model.
We derive novel non-asymptotic bounds on the classification error of the latter.
Our results extend to a noisy model with constant probability noise flips.
arXiv Detail & Related papers (2020-11-18T07:59:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.