Scaling Laws Do Not Scale
- URL: http://arxiv.org/abs/2307.03201v1
- Date: Wed, 5 Jul 2023 15:32:21 GMT
- Title: Scaling Laws Do Not Scale
- Authors: Fernando Diaz and Michael Madaio
- Abstract summary: We argue that as the size of datasets used to train large AI models grows, the number of distinct communities may grow.
As a result, there is an increased risk that communities represented in a dataset may have values or preferences not captured by metrics used to evaluate model performance.
- Score: 87.76714490248779
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work has proposed a power law relationship, referred to as ``scaling
laws,'' between the performance of artificial intelligence (AI) models and
aspects of those models' design (e.g., dataset size). In other words, as the
size of a dataset (or model parameters, etc) increases, the performance of a
given model trained on that dataset will correspondingly increase. However,
while compelling in the aggregate, this scaling law relationship overlooks the
ways that metrics used to measure performance may be precarious and contested,
or may not correspond with how different groups of people may perceive the
quality of models' output. In this paper, we argue that as the size of datasets
used to train large AI models grows, the number of distinct communities
(including demographic groups) whose data is included in a given dataset is
likely to grow, each of whom may have different values. As a result, there is
an increased risk that communities represented in a dataset may have values or
preferences not captured by (or in the worst case, at odds with) the metrics
used to evaluate model performance for scaling laws. We end the paper with
implications for AI scaling laws -- that models may not, in fact, continue to
improve as the datasets get larger -- at least not for all people or
communities impacted by those models.
Related papers
- Scaling Laws for the Value of Individual Data Points in Machine Learning [55.596413470429475]
We introduce a new perspective by investigating scaling behavior for the value of individual data points.
We provide learning theory to support our scaling law, and we observe empirically that it holds across diverse model classes.
Our work represents a first step towards understanding and utilizing scaling properties for the value of individual data points.
arXiv Detail & Related papers (2024-05-30T20:10:24Z) - DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data [48.31817189858086]
We argue that generative data can expand the data distribution that the model can learn, thus mitigating overfitting.
We find that DiverGen significantly outperforms the strong model X-Paste, achieving +1.1 box AP and +1.1 mask AP across all categories, and +1.9 box AP and +2.5 mask AP for rare categories.
arXiv Detail & Related papers (2024-05-16T15:30:18Z) - More Compute Is What You Need [3.184416958830696]
We propose a new scaling law that suggests model performance depends mostly on the amount of compute spent for transformer-based models.
We predict that (a) for inference efficiency, training should prioritize smaller model sizes and larger training datasets, and (b) assuming the exhaustion of available web datasets, scaling the model size might be the only way to further improve model performance.
arXiv Detail & Related papers (2024-04-30T12:05:48Z) - Scaling Laws For Dense Retrieval [22.76001461620846]
We investigate whether the performance of dense retrieval models follows the scaling law as other neural models.
Results indicate that, under our settings, the performance of dense retrieval models follows a precise power-law scaling related to the model size and the number of annotations.
arXiv Detail & Related papers (2024-03-27T15:27:36Z) - Non-Invasive Fairness in Learning through the Lens of Data Drift [88.37640805363317]
We show how to improve the fairness of Machine Learning models without altering the data or the learning algorithm.
We use a simple but key insight: the divergence of trends between different populations, and, consecutively, between a learned model and minority populations, is analogous to data drift.
We explore two strategies (model-splitting and reweighing) to resolve this drift, aiming to improve the overall conformance of models to the underlying data.
arXiv Detail & Related papers (2023-03-30T17:30:42Z) - A Solvable Model of Neural Scaling Laws [72.8349503901712]
Large language models with a huge number of parameters, when trained on near internet-sized number of tokens, have been empirically shown to obey neural scaling laws.
We propose a statistical model -- a joint generative data model and random feature model -- that captures this neural scaling phenomenology.
Key findings are the manner in which the power laws that occur in the statistics of natural datasets are extended by nonlinear random feature maps.
arXiv Detail & Related papers (2022-10-30T15:13:18Z) - Scaling Laws for Acoustic Models [7.906034575114518]
Recent work has shown that autoregressive generative models with cross-entropy objective functions exhibit smooth power-law relationships.
We show that acoustic models trained with an auto-predictive coding loss behave as if they are subject to similar scaling laws.
arXiv Detail & Related papers (2021-06-11T18:59:24Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - Interpretabilit\'e des mod\`eles : \'etat des lieux des m\'ethodes et
application \`a l'assurance [1.6058099298620423]
Data is the raw material of many models today make it possible to increase the quality and performance of digital services.
Models users must ensure that models do not discriminate against and that it is also possible to explain its result.
The widening of the panel of predictive algorithms leads scientists to be vigilant about the use of models.
arXiv Detail & Related papers (2020-07-25T12:18:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.