Understanding Scaling Laws for Recommendation Models
- URL: http://arxiv.org/abs/2208.08489v1
- Date: Wed, 17 Aug 2022 19:13:17 GMT
- Title: Understanding Scaling Laws for Recommendation Models
- Authors: Newsha Ardalani, Carole-Jean Wu, Zeliang Chen, Bhargav Bhushanam,
Adnan Aziz
- Abstract summary: We study empirical scaling laws for DLRM style recommendation models, in particular Click-Through Rate (CTR)
We characterize scaling efficiency along three different resource dimensions, namely data, parameters and compute.
We show that parameter scaling is out of steam for the model architecture under study, and until a higher-performing model architecture emerges, data scaling is the path forward.
- Score: 1.6283945233720964
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scale has been a major driving force in improving machine learning
performance, and understanding scaling laws is essential for strategic planning
for a sustainable model quality performance growth, long-term resource planning
and developing efficient system infrastructures to support large-scale models.
In this paper, we study empirical scaling laws for DLRM style recommendation
models, in particular Click-Through Rate (CTR). We observe that model quality
scales with power law plus constant in model size, data size and amount of
compute used for training. We characterize scaling efficiency along three
different resource dimensions, namely data, parameters and compute by comparing
the different scaling schemes along these axes. We show that parameter scaling
is out of steam for the model architecture under study, and until a
higher-performing model architecture emerges, data scaling is the path forward.
The key research questions addressed by this study include: Does a
recommendation model scale sustainably as predicted by the scaling laws? Or are
we far off from the scaling law predictions? What are the limits of scaling?
What are the implications of the scaling laws on long-term hardware/system
development?
Related papers
- Gemstones: A Model Suite for Multi-Faceted Scaling Laws [67.46133952358785]
We release the Gemstones: the most comprehensive open-source scaling law dataset to date.
These models have been trained with different learning rates, schedules, and architectural shapes.
Our checkpoints enable more complex studies of scaling, such as a law that predicts language performance as a function of model width and depth.
arXiv Detail & Related papers (2025-02-07T18:09:38Z) - Scaling Inference-Efficient Language Models [3.271571137474847]
We show that model architecture affects inference latency, where models of the same size can have up to 3.5x difference in latency.
We modify the Chinchilla scaling laws to co-optimize the model parameter count, the number of training tokens, and the model architecture.
We release the Morph-1B model, which improves inference latency by 1.8x while maintaining accuracy on downstream tasks compared to open-source models.
arXiv Detail & Related papers (2025-01-30T03:16:44Z) - Optimizing Sequential Recommendation Models with Scaling Laws and Approximate Entropy [104.48511402784763]
Performance Law for SR models aims to theoretically investigate and model the relationship between model performance and data quality.
We propose Approximate Entropy (ApEn) to assess data quality, presenting a more nuanced approach compared to traditional data quantity metrics.
arXiv Detail & Related papers (2024-11-30T10:56:30Z) - A Hitchhiker's Guide to Scaling Law Estimation [56.06982415792523]
Scaling laws predict the loss of a target machine learning model by extrapolating from easier-to-train models with fewer parameters or smaller training sets.
We estimate more than 1000 scaling laws, then derive a set of best practices for estimating scaling laws in new model families.
arXiv Detail & Related papers (2024-10-15T17:59:10Z) - More Compute Is What You Need [3.184416958830696]
We propose a new scaling law that suggests model performance depends mostly on the amount of compute spent for transformer-based models.
We predict that (a) for inference efficiency, training should prioritize smaller model sizes and larger training datasets, and (b) assuming the exhaustion of available web datasets, scaling the model size might be the only way to further improve model performance.
arXiv Detail & Related papers (2024-04-30T12:05:48Z) - Mixtures of Experts Unlock Parameter Scaling for Deep RL [54.26191237981469]
In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules into value-based networks results in more parameter-scalable models.
This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.
arXiv Detail & Related papers (2024-02-13T17:18:56Z) - A Dynamical Model of Neural Scaling Laws [79.59705237659547]
We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization.
Our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.
arXiv Detail & Related papers (2024-02-02T01:41:38Z) - A Solvable Model of Neural Scaling Laws [72.8349503901712]
Large language models with a huge number of parameters, when trained on near internet-sized number of tokens, have been empirically shown to obey neural scaling laws.
We propose a statistical model -- a joint generative data model and random feature model -- that captures this neural scaling phenomenology.
Key findings are the manner in which the power laws that occur in the statistics of natural datasets are extended by nonlinear random feature maps.
arXiv Detail & Related papers (2022-10-30T15:13:18Z) - Scaling Laws for a Multi-Agent Reinforcement Learning Model [0.0]
We study performance scaling for a cornerstone reinforcement learning algorithm, AlphaZero.
We find that player strength scales as a power law in neural network parameter count when not bottlenecked by available compute.
We find that the predicted scaling of optimal neural network size fits our data for both games.
arXiv Detail & Related papers (2022-09-29T19:08:51Z) - Scaling Laws for Acoustic Models [7.906034575114518]
Recent work has shown that autoregressive generative models with cross-entropy objective functions exhibit smooth power-law relationships.
We show that acoustic models trained with an auto-predictive coding loss behave as if they are subject to similar scaling laws.
arXiv Detail & Related papers (2021-06-11T18:59:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.