Scaling Laws Do Not Scale
- URL: http://arxiv.org/abs/2307.03201v1
- Date: Wed, 5 Jul 2023 15:32:21 GMT
- Title: Scaling Laws Do Not Scale
- Authors: Fernando Diaz and Michael Madaio
- Abstract summary: We argue that as the size of datasets used to train large AI models grows, the number of distinct communities may grow.
As a result, there is an increased risk that communities represented in a dataset may have values or preferences not captured by metrics used to evaluate model performance.
- Score: 87.76714490248779
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work has proposed a power law relationship, referred to as ``scaling
laws,'' between the performance of artificial intelligence (AI) models and
aspects of those models' design (e.g., dataset size). In other words, as the
size of a dataset (or model parameters, etc) increases, the performance of a
given model trained on that dataset will correspondingly increase. However,
while compelling in the aggregate, this scaling law relationship overlooks the
ways that metrics used to measure performance may be precarious and contested,
or may not correspond with how different groups of people may perceive the
quality of models' output. In this paper, we argue that as the size of datasets
used to train large AI models grows, the number of distinct communities
(including demographic groups) whose data is included in a given dataset is
likely to grow, each of whom may have different values. As a result, there is
an increased risk that communities represented in a dataset may have values or
preferences not captured by (or in the worst case, at odds with) the metrics
used to evaluate model performance for scaling laws. We end the paper with
implications for AI scaling laws -- that models may not, in fact, continue to
improve as the datasets get larger -- at least not for all people or
communities impacted by those models.
Related papers
- Meek Models Shall Inherit the Earth [1.9647223141071104]
The past decade has seen incredible scaling of AI systems by a few companies, leading to inequality in AI model performance.<n>This paper argues that, contrary to prevailing intuition, the diminishing returns to compute scaling will lead to a convergence of AI model capabilities.
arXiv Detail & Related papers (2025-07-10T17:10:07Z) - Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning [77.120955854093]
We show that data diversity can be a strong predictor of generalization in language models.<n>We introduce G-Vendi, a metric that quantifies diversity via the entropy of model-induced gradients.<n>We present Prismatic Synthesis, a framework for generating diverse synthetic data.
arXiv Detail & Related papers (2025-05-26T16:05:10Z) - Exploring Training and Inference Scaling Laws in Generative Retrieval [50.82554729023865]
We investigate how model size, training data scale, and inference-time compute jointly influence generative retrieval performance.
Our experiments show that n-gram-based methods demonstrate strong alignment with both training and inference scaling laws.
We find that LLaMA models consistently outperform T5 models, suggesting a particular advantage for larger decoder-only models in generative retrieval.
arXiv Detail & Related papers (2025-03-24T17:59:03Z) - Scaling Laws for Pre-training Agents and World Models [22.701210075508147]
Performance of embodied agents has been shown to improve by increasing model parameters, dataset size, and compute.
This paper characterizes the role of scale in these tasks more precisely.
arXiv Detail & Related papers (2024-11-07T04:57:40Z) - A Hitchhiker's Guide to Scaling Law Estimation [56.06982415792523]
Scaling laws predict the loss of a target machine learning model by extrapolating from easier-to-train models with fewer parameters or smaller training sets.
We estimate more than 1000 scaling laws, then derive a set of best practices for estimating scaling laws in new model families.
arXiv Detail & Related papers (2024-10-15T17:59:10Z) - Scaling Laws For Dense Retrieval [22.76001461620846]
We investigate whether the performance of dense retrieval models follows the scaling law as other neural models.
Results indicate that, under our settings, the performance of dense retrieval models follows a precise power-law scaling related to the model size and the number of annotations.
arXiv Detail & Related papers (2024-03-27T15:27:36Z) - A Tale of Tails: Model Collapse as a Change of Scaling Laws [11.6055501181235]
We ask: How will the scaling laws change in the inevitable regime where synthetic data makes its way into the training corpus?
We develop a theoretical framework of model collapse through the lens of scaling laws.
We discover a wide range of decay phenomena, analyzing loss of scaling, shifted scaling with number of generations, the ''un-learning" of skills, and grokking when mixing human and synthesized data.
arXiv Detail & Related papers (2024-02-10T21:06:34Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition [99.7047087527422]
In this work, we demonstrate that competition can fundamentally alter the behavior of machine learning scaling trends.
We find many settings where improving data representation quality decreases the overall predictive accuracy across users.
At a conceptual level, our work suggests that favorable scaling trends for individual model-providers need not translate to downstream improvements in social welfare.
arXiv Detail & Related papers (2023-06-26T13:06:34Z) - Non-Invasive Fairness in Learning through the Lens of Data Drift [88.37640805363317]
We show how to improve the fairness of Machine Learning models without altering the data or the learning algorithm.
We use a simple but key insight: the divergence of trends between different populations, and, consecutively, between a learned model and minority populations, is analogous to data drift.
We explore two strategies (model-splitting and reweighing) to resolve this drift, aiming to improve the overall conformance of models to the underlying data.
arXiv Detail & Related papers (2023-03-30T17:30:42Z) - Variation of Gender Biases in Visual Recognition Models Before and After
Finetuning [29.55318393877906]
We introduce a framework to measure how biases change before and after fine-tuning a large scale visual recognition model for a downstream task.
We find that supervised models trained on datasets such as ImageNet-21k are more likely to retain their pretraining biases.
We also find that models finetuned on larger scale datasets are more likely to introduce new biased associations.
arXiv Detail & Related papers (2023-03-14T03:42:47Z) - A Study on the Evaluation of Generative Models [19.18642459565609]
Implicit generative models, which do not return likelihood values, have become prevalent in recent years.
In this work, we study the evaluation metrics of generative models by generating a high-quality synthetic dataset.
Our study shows that while FID and IS do correlate to several f-divergences, their ranking of close models can vary considerably.
arXiv Detail & Related papers (2022-06-22T09:27:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.