Related papers: Understanding Embedding Scaling in Collaborative Filtering

Understanding Embedding Scaling in Collaborative Filtering

URL: http://arxiv.org/abs/2509.15709v2
Date: Mon, 27 Oct 2025 08:48:20 GMT
Title: Understanding Embedding Scaling in Collaborative Filtering
Authors: Yicheng He, Zhou Kaiyu, Haoyue Bai, Fengbin Zhu, Yonghui Yang,
Abstract summary: We conduct large-scale experiments across 10 datasets with varying sparsity levels and scales.<n>We observe two novel phenomena: double-peak and logarithmic.<n>We gain an understanding of the underlying causes of the double-peak phenomenon.
Score: 12.221835332469228
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Scaling recommendation models into large recommendation models has become one of the most widely discussed topics. Recent efforts focus on components beyond the scaling embedding dimension, as it is believed that scaling embedding may lead to performance degradation. Although there have been some initial observations on embedding, the root cause of their non-scalability remains unclear. Moreover, whether performance degradation occurs across different types of models and datasets is still an unexplored area. Regarding the effect of embedding dimensions on performance, we conduct large-scale experiments across 10 datasets with varying sparsity levels and scales, using 4 representative classical architectures. We surprisingly observe two novel phenomena: double-peak and logarithmic. For the former, as the embedding dimension increases, performance first improves, then declines, rises again, and eventually drops. For the latter, it exhibits a perfect logarithmic curve. Our contributions are threefold. First, we discover two novel phenomena when scaling collaborative filtering models. Second, we gain an understanding of the underlying causes of the double-peak phenomenon. Lastly, we theoretically analyze the noise robustness of collaborative filtering models, with results matching empirical observations.

Related papers

Deep Exploration of Epoch-wise Double Descent in Noisy Data: Signal Separation, Large Activation, and Benign Overfitting [0.0]
"Deep double descent" is one of the key phenomena underlying the generalization capability of deep learning models.<n>In this study, epoch-wise double descent was investigated by focusing on the evolution of internal structures.<n>Results: The model achieved strong re-generalization on test data even after perfectly fitting noisy training data during the double descent phase.
arXiv Detail & Related papers (2026-01-13T08:13:15Z)
Unidentified and Confounded? Understanding Two-Tower Models for Unbiased Learning to Rank [50.9530591265324]
Training two-tower models on clicks collected by well-performing production systems leads to decreased ranking performance.<n>We theoretically analyze the identifiability conditions of two-tower models, showing that either document swaps across positions or overlapping feature distributions are required to recover model parameters from clicks.<n>We also investigate the effect of logging policies on two-tower models, finding that they introduce no bias when models perfectly capture user behavior.
arXiv Detail & Related papers (2025-06-25T14:47:43Z)
Towards Scalable and Deep Graph Neural Networks via Noise Masking [59.058558158296265]
Graph Neural Networks (GNNs) have achieved remarkable success in many graph mining tasks.<n> scaling them to large graphs is challenging due to the high computational and storage costs.<n>We present random walk with noise masking (RMask), a plug-and-play module compatible with the existing model-simplification works.
arXiv Detail & Related papers (2024-12-19T07:48:14Z)
MixRec: Heterogeneous Graph Collaborative Filtering [21.96510707666373]
We present a graph collaborative filtering model MixRec to disentangling users' multi-behavior interaction patterns.<n>Our model achieves this by incorporating intent disentanglement and multi-behavior modeling.<n>We also introduce a novel contrastive learning paradigm that adaptively explores the advantages of self-supervised data augmentation.
arXiv Detail & Related papers (2024-12-18T13:12:36Z)
Unveiling Multiple Descents in Unsupervised Autoencoders [25.244065166421517]
We show for the first time that double and triple descent can be observed with nonlinear unsupervised autoencoders.<n>Through extensive experiments on both synthetic and real datasets, we uncover model-wise, epoch-wise, and sample-wise double descent.
arXiv Detail & Related papers (2024-06-17T16:24:23Z)
Understanding the Double Descent Phenomenon in Deep Learning [49.1574468325115]
This tutorial sets the classical statistical learning framework and introduces the double descent phenomenon. By looking at a number of examples, section 2 introduces inductive biases that appear to have a key role in double descent by selecting. section 3 explores the double descent with two linear models, and gives other points of view from recent related works.
arXiv Detail & Related papers (2024-03-15T16:51:24Z)
On the Embedding Collapse when Scaling up Recommendation Models [53.66285358088788]
We identify the embedding collapse phenomenon as the inhibition of scalability, wherein the embedding matrix tends to occupy a low-dimensional subspace. We propose a simple yet effective multi-embedding design incorporating embedding-set-specific interaction modules to learn embedding sets with large diversity.
arXiv Detail & Related papers (2023-10-06T17:50:38Z)
Predicting and Enhancing the Fairness of DNNs with the Curvature of Perceptual Manifolds [44.79535333220044]
Recent studies have shown that tail classes are not always hard to learn, and model bias has been observed on sample-balanced datasets.<n>In this work, we first establish a geometric perspective for analyzing model fairness and then systematically propose a series of geometric measurements.
arXiv Detail & Related papers (2023-03-22T04:49:23Z)
On the Role of Optimization in Double Descent: A Least Squares Study [30.44215064390409]
We show an excess risk bound for the descent gradient solution of the least squares objective. We find that in case of noiseless regression, double descent is explained solely by optimization-related quantities. We empirically explore if our predictions hold for neural networks.
arXiv Detail & Related papers (2021-07-27T09:13:11Z)
Hard-label Manifolds: Unexpected Advantages of Query Efficiency for Finding On-manifold Adversarial Examples [67.23103682776049]
Recent zeroth order hard-label attacks on image classification models have shown comparable performance to their first-order, gradient-level alternatives. It was recently shown in the gradient-level setting that regular adversarial examples leave the data manifold, while their on-manifold counterparts are in fact generalization errors. We propose an information-theoretic argument based on a noisy manifold distance oracle, which leaks manifold information through the adversary's gradient estimate.
arXiv Detail & Related papers (2021-03-04T20:53:06Z)
Amortized Causal Discovery: Learning to Infer Causal Graphs from Time-Series Data [63.15776078733762]
We propose Amortized Causal Discovery, a novel framework to learn to infer causal relations from time-series data. We demonstrate experimentally that this approach, implemented as a variational model, leads to significant improvements in causal discovery performance.
arXiv Detail & Related papers (2020-06-18T19:59:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.