Related papers: Enhancing Cross-Category Learning in Recommendation Systems with Multi-Layer Embedding Training

Enhancing Cross-Category Learning in Recommendation Systems with Multi-Layer Embedding Training

URL: http://arxiv.org/abs/2309.15881v1
Date: Wed, 27 Sep 2023 09:32:10 GMT
Title: Enhancing Cross-Category Learning in Recommendation Systems with Multi-Layer Embedding Training
Authors: Zihao Deng, Benjamin Ghaemmaghami, Ashish Kumar Singh, Benjamin Cho, Leo Orshansky, Mattan Erez, Michael Orshansky
Abstract summary: Multi-layer embeddings training (MLET) trains embeddings using factorization of the embedding layer, with an inner dimension higher than the target embedding dimension. MLET consistently produces better models, especially for rare items. At constant model quality, MLET allows embedding dimension, and model size, reduction by up to 16x, and 5.8x on average.
Score: 2.4862527485819186
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern DNN-based recommendation systems rely on training-derived embeddings of sparse features. Input sparsity makes obtaining high-quality embeddings for rarely-occurring categories harder as their representations are updated infrequently. We demonstrate a training-time technique to produce superior embeddings via effective cross-category learning and theoretically explain its surprising effectiveness. The scheme, termed the multi-layer embeddings training (MLET), trains embeddings using factorization of the embedding layer, with an inner dimension higher than the target embedding dimension. For inference efficiency, MLET converts the trained two-layer embedding into a single-layer one thus keeping inference-time model size unchanged. Empirical superiority of MLET is puzzling as its search space is not larger than that of the single-layer embedding. The strong dependence of MLET on the inner dimension is even more surprising. We develop a theory that explains both of these behaviors by showing that MLET creates an adaptive update mechanism modulated by the singular vectors of embeddings. When tested on multiple state-of-the-art recommendation models for click-through rate (CTR) prediction tasks, MLET consistently produces better models, especially for rare items. At constant model quality, MLET allows embedding dimension, and model size, reduction by up to 16x, and 5.8x on average, across the models.

Related papers

Learning Covariance-Based Multi-Scale Representation of Neuroimaging Measures for Alzheimer Classification [5.427921447614832]
We present a framework capable of deriving an efficient high-dimensional space with reasonable increase in model size. Experiments on neuroimaging measures from Alzheimer's Disease Neuroimaging Initiative (ADNI) study show that our model performs better. The trained model is made interpretable using gradient information over the multi-scale transform to delineate personalized AD-specific regions in the brain.
arXiv Detail & Related papers (2025-03-03T06:55:35Z)
2D Matryoshka Training for Information Retrieval [32.44832240958393]
2D Matryoshka Training is an embedding representation training approach designed to train an encoder model simultaneously across various layer-dimension setups. We implement and evaluate both versions of 2D Matryoshka Training on STS tasks and extend our analysis to retrieval tasks.
arXiv Detail & Related papers (2024-11-26T10:47:35Z)
Starbucks: Improved Training for 2D Matryoshka Embeddings [32.44832240958393]
We propose Starbucks, a new training strategy for Matryoshka-like embedding models. For the fine-tuning phase, we provide a fixed list of layer-dimension pairs, from small size to large sizes. We also introduce a new pre-training strategy, which applies masked autoencoder language modelling to sub-layers and sub-dimensions.
arXiv Detail & Related papers (2024-10-17T05:33:50Z)
Inheritune: Training Smaller Yet More Attentive Language Models [61.363259848264725]
Inheritune is a simple yet effective training recipe for developing smaller, high-performing language models. We demonstrate that Inheritune enables the training of various sizes of GPT-2 models on datasets like OpenWebText-9B and FineWeb_edu.
arXiv Detail & Related papers (2024-04-12T17:53:34Z)
Fine-Grained Embedding Dimension Optimization During Training for Recommender Systems [17.602059421895856]
FIITED is a system to automatically reduce the memory footprint via FIne-grained In-Training Embedding Dimension pruning. We show that FIITED can reduce DLRM embedding size by more than 65% while preserving model quality. On public datasets, FIITED can reduce the size of embedding tables by 2.1x to 800x with negligible accuracy drop.
arXiv Detail & Related papers (2024-01-09T08:04:11Z)
Improving Discriminative Multi-Modal Learning with Large-Scale Pre-Trained Models [51.5543321122664]
This paper investigates how to better leverage large-scale pre-trained uni-modal models to enhance discriminative multi-modal learning. We introduce Multi-Modal Low-Rank Adaptation learning (MMLoRA)
arXiv Detail & Related papers (2023-10-08T15:01:54Z)
RanPAC: Random Projections and Pre-trained Models for Continual Learning [59.07316955610658]
Continual learning (CL) aims to learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones. We propose a concise and effective approach for CL with pre-trained models.
arXiv Detail & Related papers (2023-07-05T12:49:02Z)
Phantom Embeddings: Using Embedding Space for Model Regularization in Deep Neural Networks [12.293294756969477]
The strength of machine learning models stems from their ability to learn complex function approximations from data. The complex models tend to memorize the training data, which results in poor regularization performance on test data. We present a novel approach to regularize the models by leveraging the information-rich latent embeddings and their high intra-class correlation.
arXiv Detail & Related papers (2023-04-14T17:15:54Z)
TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks. We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework. TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z)
MACE: An Efficient Model-Agnostic Framework for Counterfactual Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE) In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity. Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z)
LV-BERT: Exploiting Layer Variety for BERT [85.27287501885807]
We introduce convolution into the layer type set, which is experimentally found beneficial to pre-trained models. We then adopt an evolutionary algorithm guided by pre-training accuracy to find the optimal architecture. LV-BERT model obtained by our method outperforms BERT and its variants on various downstream tasks.
arXiv Detail & Related papers (2021-06-22T13:20:14Z)
Training with Multi-Layer Embeddings for Model Reduction [0.9046327456472286]
We introduce a multi-layer embedding training architecture that trains embeddings via a sequence of linear layers. We show that it allows reducing d by 4-8X, with a corresponding improvement in memory footprint, at given model accuracy.
arXiv Detail & Related papers (2020-06-10T02:47:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.