Related papers: On the Embedding Collapse when Scaling up Recommendation Models

On the Embedding Collapse when Scaling up Recommendation Models

URL: http://arxiv.org/abs/2310.04400v2
Date: Thu, 6 Jun 2024 08:36:57 GMT
Title: On the Embedding Collapse when Scaling up Recommendation Models
Authors: Xingzhuo Guo, Junwei Pan, Ximei Wang, Baixu Chen, Jie Jiang, Mingsheng Long,
Abstract summary: We identify the embedding collapse phenomenon as the inhibition of scalability, wherein the embedding matrix tends to occupy a low-dimensional subspace. We propose a simple yet effective multi-embedding design incorporating embedding-set-specific interaction modules to learn embedding sets with large diversity.
Score: 53.66285358088788
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in foundation models have led to a promising trend of developing large recommendation models to leverage vast amounts of available data. Still, mainstream models remain embarrassingly small in size and na\"ive enlarging does not lead to sufficient performance gain, suggesting a deficiency in the model scalability. In this paper, we identify the embedding collapse phenomenon as the inhibition of scalability, wherein the embedding matrix tends to occupy a low-dimensional subspace. Through empirical and theoretical analysis, we demonstrate a \emph{two-sided effect} of feature interaction specific to recommendation models. On the one hand, interacting with collapsed embeddings restricts embedding learning and exacerbates the collapse issue. On the other hand, interaction is crucial in mitigating the fitting of spurious features as a scalability guarantee. Based on our analysis, we propose a simple yet effective multi-embedding design incorporating embedding-set-specific interaction modules to learn embedding sets with large diversity and thus reduce collapse. Extensive experiments demonstrate that this proposed design provides consistent scalability and effective collapse mitigation for various recommendation models. Code is available at this repository: https://github.com/thuml/Multi-Embedding.

Related papers

Why Do More Experts Fail? A Theoretical Analysis of Model Merging [51.18155031364046]
Model merging dramatically reduces storage and computational resources by combining multiple expert models into a single multi-task model.<n>Recent model merging methods have shown promising results, but struggle to maintain performance gains as the number of merged models increases.<n>We show that the limited effective parameter space imposes a strict constraint on the number of models that can be successfully merged.
arXiv Detail & Related papers (2025-05-27T14:10:46Z)
MixRec: Heterogeneous Graph Collaborative Filtering [21.96510707666373]
We present a graph collaborative filtering model MixRec to disentangling users' multi-behavior interaction patterns. Our model achieves this by incorporating intent disentanglement and multi-behavior modeling. We also introduce a novel contrastive learning paradigm that adaptively explores the advantages of self-supervised data augmentation.
arXiv Detail & Related papers (2024-12-18T13:12:36Z)
A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models. Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning. We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z)
Beyond the Kolmogorov Barrier: A Learnable Weighted Hybrid Autoencoder for Model Order Reduction [1.0742675209112622]
We propose a learnable weighted hybrid autoencoder to overcome the Kolmogorov barrier. We empirically find that our trained model has a sharpness thousands of times smaller compared to other models.
arXiv Detail & Related papers (2024-10-23T00:04:26Z)
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction. SMILE allows for the upscaling of source models into an MoE model without extra data or further training. We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z)
FFHFlow: A Flow-based Variational Approach for Learning Diverse Dexterous Grasps with Shape-Aware Introspection [19.308304984645684]
We introduce a novel model that can generate diverse grasps for a multi-fingered hand. The proposed idea gains superior performance and higher run-time efficiency against strong baselines. We also demonstrate substantial benefits of greater diversity for grasping objects in clutter and a confined workspace in the real world.
arXiv Detail & Related papers (2024-07-21T13:33:08Z)
Scalable Ensembling For Mitigating Reward Overoptimisation [24.58937616758007]
Reinforcement Learning from Human Feedback has enabled significant advancements within language modeling for powerful, instruction-following models. The alignment of these models remains a pressing challenge as the policy tends to overfit the learned proxy" reward model past an inflection point of utility.
arXiv Detail & Related papers (2024-06-03T05:46:53Z)
Uplift Modeling Under Limited Supervision [11.548203301440179]
Estimating causal effects in e-commerce tends to involve costly treatment assignments which can be impractical in large-scale settings. We propose a graph neural network to diminish the required training set size, relying on graphs that are common in e-commerce data.
arXiv Detail & Related papers (2024-03-28T10:19:36Z)
The Risk of Federated Learning to Skew Fine-Tuning Features and Underperform Out-of-Distribution Robustness [50.52507648690234]
Federated learning has the risk of skewing fine-tuning features and compromising the robustness of the model. We introduce three robustness indicators and conduct experiments across diverse robust datasets. Our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient fine-tuning methods.
arXiv Detail & Related papers (2024-01-25T09:18:51Z)
Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition. We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z)
MEIM: Multi-partition Embedding Interaction Beyond Block Term Format for Efficient and Expressive Link Prediction [3.718476964451589]
We introduce the Multi- Partition Embedding Interaction iMproved beyond block term format (MEIM) model. MEIM improves expressiveness while still being highly efficient, helping it to outperform strong baselines and achieve state-of-the-art results.
arXiv Detail & Related papers (2022-09-30T17:20:03Z)
Revisiting Design Choices in Model-Based Offline Reinforcement Learning [39.01805509055988]
Offline reinforcement learning enables agents to leverage large pre-collected datasets of environment transitions to learn control policies. This paper compares and designs novel protocols to investigate their interaction with other hyper parameters, such as the number of models, or imaginary rollout horizon.
arXiv Detail & Related papers (2021-10-08T13:51:34Z)
Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors. We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method. Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.