Related papers: Optimizing Sequential Recommendation Models with Scaling Laws and Approximate Entropy

Optimizing Sequential Recommendation Models with Scaling Laws and Approximate Entropy

URL: http://arxiv.org/abs/2412.00430v6
Date: Wed, 19 Feb 2025 09:23:03 GMT
Title: Optimizing Sequential Recommendation Models with Scaling Laws and Approximate Entropy
Authors: Tingjia Shen, Hao Wang, Chuhan Wu, Jin Yao Chin, Wei Guo, Yong Liu, Huifeng Guo, Defu Lian, Ruiming Tang, Enhong Chen,
Abstract summary: Performance Law for SR models aims to theoretically investigate and model the relationship between model performance and data quality.<n>We propose Approximate Entropy (ApEn) to assess data quality, presenting a more nuanced approach compared to traditional data quantity metrics.
Score: 104.48511402784763
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Scaling Laws have emerged as a powerful framework for understanding how model performance evolves as they increase in size, providing valuable insights for optimizing computational resources. In the realm of Sequential Recommendation (SR), which is pivotal for predicting users' sequential preferences, these laws offer a lens through which to address the challenges posed by the scalability of SR models. However, the presence of structural and collaborative issues in recommender systems prevents the direct application of the Scaling Law (SL) in these systems. In response, we introduce the Performance Law for SR models, which aims to theoretically investigate and model the relationship between model performance and data quality. Specifically, we first fit the HR and NDCG metrics to transformer-based SR models. Subsequently, we propose Approximate Entropy (ApEn) to assess data quality, presenting a more nuanced approach compared to traditional data quantity metrics. Our method enables accurate predictions across various dataset scales and model sizes, demonstrating a strong correlation in large SR models and offering insights into achieving optimal performance for any given model configuration.

Related papers

Scaling Transformers for Discriminative Recommendation via Generative Pretraining [15.796591192359044]
We propose a framework named GPSD (textbfGenerative textbfPretraining for textbfScalable textbfDiscriminative Recommendation) to address the overfitting issue.<n>Extensive experiments conducted on both industrial-scale and publicly available datasets demonstrate the superior performance of GPSD.
arXiv Detail & Related papers (2025-06-04T08:31:33Z)
LARES: Latent Reasoning for Sequential Recommendation [96.26996622771593]
We present LARES, a novel and scalable LAtent REasoning framework for Sequential recommendation.<n>Our proposed approach employs a recurrent architecture that allows flexible expansion of reasoning depth without increasing parameter complexity.<n>Our framework exhibits seamless compatibility with existing advanced models, further improving their recommendation performance.
arXiv Detail & Related papers (2025-05-22T16:22:54Z)
A Comparative Study of Recommender Systems under Big Data Constraints [0.0]
This paper presents a comparative study of recommender algorithms, including EASE-R, SLIM, SLIM with ElasticNet regularization, Matrix Factorization (FunkSVD and ALS), P3Alpha, and RP3Beta. We evaluate these models according to key criteria such as scalability, computational complexity, predictive accuracy, and interpretability. This study aims to provide guidelines for selecting the most appropriate recommender approach based on specific Big Data constraints and system requirements.
arXiv Detail & Related papers (2025-04-11T11:35:13Z)
Exploring Training and Inference Scaling Laws in Generative Retrieval [50.82554729023865]
We investigate how model size, training data scale, and inference-time compute jointly influence generative retrieval performance. Our experiments show that n-gram-based methods demonstrate strong alignment with both training and inference scaling laws. We find that LLaMA models consistently outperform T5 models, suggesting a particular advantage for larger decoder-only models in generative retrieval.
arXiv Detail & Related papers (2025-03-24T17:59:03Z)
Exploring Patterns Behind Sports [3.2838877620203935]
This paper presents a comprehensive framework for time series prediction using a hybrid model that combines ARIMA and LSTM. The model incorporates feature engineering techniques, including embedding and PCA, to transform raw data into a lower-dimensional representation.
arXiv Detail & Related papers (2025-02-11T11:51:07Z)
Scaling New Frontiers: Insights into Large Recommendation Models [74.77410470984168]
Meta's generative recommendation model HSTU illustrates the scaling laws of recommendation systems by expanding parameters to thousands of billions. We conduct comprehensive ablation studies to explore the origins of these scaling laws. We offer insights into future directions for large recommendation models.
arXiv Detail & Related papers (2024-12-01T07:27:20Z)
Enhancing Few-Shot Learning with Integrated Data and GAN Model Approaches [35.431340001608476]
This paper presents an innovative approach to enhancing few-shot learning by integrating data augmentation with model fine-tuning. It aims to tackle the challenges posed by small-sample data in fields such as drug discovery, target recognition, and malicious traffic detection. Results confirm that the MhERGAN algorithm developed in this research is highly effective for few-shot learning.
arXiv Detail & Related papers (2024-11-25T16:51:11Z)
A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models. Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning. We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z)
Semi-Supervised Reward Modeling via Iterative Self-Training [52.48668920483908]
We propose Semi-Supervised Reward Modeling (SSRM), an approach that enhances RM training using unlabeled data. We demonstrate that SSRM significantly improves reward models without incurring additional labeling costs. Overall, SSRM substantially reduces the dependency on large volumes of human-annotated data, thereby decreasing the overall cost and time involved in training effective reward models.
arXiv Detail & Related papers (2024-09-10T22:57:58Z)
STLM Engineering Report: Dropout [4.3600359083731695]
We find that dropout remains effective in the overfitting scenario, and that it may have some relevance for improving the fit of models even in the case of excess data. In the process we find that the existing explanation for the mechanism behind this performance gain is not applicable in the case of language modelling.
arXiv Detail & Related papers (2024-09-09T08:24:29Z)
QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement. QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights. We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z)
Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer. The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z)
Scaling Laws Do Not Scale [54.72120385955072]
Recent work has argued that as the size of a dataset increases, the performance of a model trained on that dataset will increase. We argue that this scaling law relationship depends on metrics used to measure performance that may not correspond with how different groups of people perceive the quality of models' output. Different communities may also have values in tension with each other, leading to difficult, potentially irreconcilable choices about metrics used for model evaluations.
arXiv Detail & Related papers (2023-07-05T15:32:21Z)
Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO) MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts. Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z)
CausalAgents: A Robustness Benchmark for Motion Forecasting using Causal Relationships [8.679073301435265]
We construct a new benchmark for evaluating and improving model robustness by applying perturbations to existing data. We use these labels to perturb the data by deleting non-causal agents from the scene. Under non-causal perturbations, we observe a $25$-$38%$ relative change in minADE as compared to the original.
arXiv Detail & Related papers (2022-07-07T21:28:23Z)
Factorized Structured Regression for Large-Scale Varying Coefficient Models [1.3282354370017082]
We propose Factorized Structured Regression (FaStR) for scalable varying coefficient models. FaStR overcomes limitations of general regression models for large-scale data by combining structured additive regression and factorization approaches in a neural network-based model implementation. Empirical results confirm that the estimation of varying coefficients of our approach is on par with state-of-the-art regression techniques.
arXiv Detail & Related papers (2022-05-25T23:12:13Z)
Learning Distributionally Robust Models at Scale via Composite Optimization [45.47760229170775]
We show how different variants of DRO are simply instances of a finite-sum composite optimization for which we provide scalable methods. We also provide empirical results that demonstrate the effectiveness of our proposed algorithm with respect to the prior art in order to learn robust models from very large datasets.
arXiv Detail & Related papers (2022-03-17T20:47:42Z)
Learning to Refit for Convex Learning Problems [11.464758257681197]
We propose a framework to learn to estimate optimized model parameters for different training sets using neural networks. We rigorously characterize the power of neural networks to approximate convex problems.
arXiv Detail & Related papers (2021-11-24T15:28:50Z)
Back2Future: Leveraging Backfill Dynamics for Improving Real-time Predictions in Future [73.03458424369657]
In real-time forecasting in public health, data collection is a non-trivial and demanding task. 'Backfill' phenomenon and its effect on model performance has been barely studied in the prior literature. We formulate a novel problem and neural framework Back2Future that aims to refine a given model's predictions in real-time.
arXiv Detail & Related papers (2021-06-08T14:48:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.