Optimizing Sequential Recommendation Models with Scaling Laws and Approximate Entropy
- URL: http://arxiv.org/abs/2412.00430v6
- Date: Wed, 19 Feb 2025 09:23:03 GMT
- Title: Optimizing Sequential Recommendation Models with Scaling Laws and Approximate Entropy
- Authors: Tingjia Shen, Hao Wang, Chuhan Wu, Jin Yao Chin, Wei Guo, Yong Liu, Huifeng Guo, Defu Lian, Ruiming Tang, Enhong Chen,
- Abstract summary: Performance Law for SR models aims to theoretically investigate and model the relationship between model performance and data quality.
We propose Approximate Entropy (ApEn) to assess data quality, presenting a more nuanced approach compared to traditional data quantity metrics.
- Score: 104.48511402784763
- License:
- Abstract: Scaling Laws have emerged as a powerful framework for understanding how model performance evolves as they increase in size, providing valuable insights for optimizing computational resources. In the realm of Sequential Recommendation (SR), which is pivotal for predicting users' sequential preferences, these laws offer a lens through which to address the challenges posed by the scalability of SR models. However, the presence of structural and collaborative issues in recommender systems prevents the direct application of the Scaling Law (SL) in these systems. In response, we introduce the Performance Law for SR models, which aims to theoretically investigate and model the relationship between model performance and data quality. Specifically, we first fit the HR and NDCG metrics to transformer-based SR models. Subsequently, we propose Approximate Entropy (ApEn) to assess data quality, presenting a more nuanced approach compared to traditional data quantity metrics. Our method enables accurate predictions across various dataset scales and model sizes, demonstrating a strong correlation in large SR models and offering insights into achieving optimal performance for any given model configuration.
Related papers
- Exploring Patterns Behind Sports [3.2838877620203935]
This paper presents a comprehensive framework for time series prediction using a hybrid model that combines ARIMA and LSTM.
The model incorporates feature engineering techniques, including embedding and PCA, to transform raw data into a lower-dimensional representation.
arXiv Detail & Related papers (2025-02-11T11:51:07Z) - Scaling New Frontiers: Insights into Large Recommendation Models [74.77410470984168]
Meta's generative recommendation model HSTU illustrates the scaling laws of recommendation systems by expanding parameters to thousands of billions.
We conduct comprehensive ablation studies to explore the origins of these scaling laws.
We offer insights into future directions for large recommendation models.
arXiv Detail & Related papers (2024-12-01T07:27:20Z) - Enhancing Few-Shot Learning with Integrated Data and GAN Model Approaches [35.431340001608476]
This paper presents an innovative approach to enhancing few-shot learning by integrating data augmentation with model fine-tuning.
It aims to tackle the challenges posed by small-sample data in fields such as drug discovery, target recognition, and malicious traffic detection.
Results confirm that the MhERGAN algorithm developed in this research is highly effective for few-shot learning.
arXiv Detail & Related papers (2024-11-25T16:51:11Z) - A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models.
Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning.
We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z) - More Compute Is What You Need [3.184416958830696]
We propose a new scaling law that suggests model performance depends mostly on the amount of compute spent for transformer-based models.
We predict that (a) for inference efficiency, training should prioritize smaller model sizes and larger training datasets, and (b) assuming the exhaustion of available web datasets, scaling the model size might be the only way to further improve model performance.
arXiv Detail & Related papers (2024-04-30T12:05:48Z) - Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer.
The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z) - Precision-Recall Divergence Optimization for Generative Modeling with
GANs and Normalizing Flows [54.050498411883495]
We develop a novel training method for generative models, such as Generative Adversarial Networks and Normalizing Flows.
We show that achieving a specified precision-recall trade-off corresponds to minimizing a unique $f$-divergence from a family we call the textitPR-divergences.
Our approach improves the performance of existing state-of-the-art models like BigGAN in terms of either precision or recall when tested on datasets such as ImageNet.
arXiv Detail & Related papers (2023-05-30T10:07:17Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z) - Learning Distributionally Robust Models at Scale via Composite
Optimization [45.47760229170775]
We show how different variants of DRO are simply instances of a finite-sum composite optimization for which we provide scalable methods.
We also provide empirical results that demonstrate the effectiveness of our proposed algorithm with respect to the prior art in order to learn robust models from very large datasets.
arXiv Detail & Related papers (2022-03-17T20:47:42Z) - Learning to Refit for Convex Learning Problems [11.464758257681197]
We propose a framework to learn to estimate optimized model parameters for different training sets using neural networks.
We rigorously characterize the power of neural networks to approximate convex problems.
arXiv Detail & Related papers (2021-11-24T15:28:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.