Scaling Laws for a Multi-Agent Reinforcement Learning Model
- URL: http://arxiv.org/abs/2210.00849v1
- Date: Thu, 29 Sep 2022 19:08:51 GMT
- Title: Scaling Laws for a Multi-Agent Reinforcement Learning Model
- Authors: Oren Neumann and Claudius Gros
- Abstract summary: We study performance scaling for a cornerstone reinforcement learning algorithm, AlphaZero.
We find that player strength scales as a power law in neural network parameter count when not bottlenecked by available compute.
We find that the predicted scaling of optimal neural network size fits our data for both games.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent observation of neural power-law scaling relations has made a
significant impact in the field of deep learning. A substantial amount of
attention has been dedicated as a consequence to the description of scaling
laws, although mostly for supervised learning and only to a reduced extent for
reinforcement learning frameworks. In this paper we present an extensive study
of performance scaling for a cornerstone reinforcement learning algorithm,
AlphaZero. On the basis of a relationship between Elo rating, playing strength
and power-law scaling, we train AlphaZero agents on the games Connect Four and
Pentago and analyze their performance. We find that player strength scales as a
power law in neural network parameter count when not bottlenecked by available
compute, and as a power of compute when training optimally sized agents. We
observe nearly identical scaling exponents for both games. Combining the two
observed scaling laws we obtain a power law relating optimal size to compute
similar to the ones observed for language models. We find that the predicted
scaling of optimal neural network size fits our data for both games. This
scaling law implies that previously published state-of-the-art game-playing
models are significantly smaller than their optimal size, given the respective
compute budgets. We also show that large AlphaZero models are more sample
efficient, performing better than smaller models with the same amount of
training data.
Related papers
- A Hitchhiker's Guide to Scaling Law Estimation [56.06982415792523]
Scaling laws predict the loss of a target machine learning model by extrapolating from easier-to-train models with fewer parameters or smaller training sets.
We estimate more than 1000 scaling laws, then derive a set of best practices for estimating scaling laws in new model families.
arXiv Detail & Related papers (2024-10-15T17:59:10Z) - Information-Theoretic Foundations for Neural Scaling Laws [20.617552198581024]
We develop information-theoretic foundations for neural scaling laws.
We observe that the optimal relation between data and model size is linear, up to logarithmic factors.
arXiv Detail & Related papers (2024-06-28T02:20:54Z) - Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations [62.132347451049455]
Scale has become a main ingredient in obtaining strong machine learning models.
In this work, we argue that scale and training research has been needlessly complex due to reliance on the cosine schedule.
We show that weight averaging yields improved performance along the training trajectory, without additional training costs, across different scales.
arXiv Detail & Related papers (2024-05-28T17:33:54Z) - A Dynamical Model of Neural Scaling Laws [79.59705237659547]
We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization.
Our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.
arXiv Detail & Related papers (2024-02-02T01:41:38Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Scaling Laws for Sparsely-Connected Foundation Models [70.41266138010657]
We explore the impact of parameter sparsity on the scaling behavior of Transformers trained on massive datasets.
We identify the first scaling law describing the relationship between weight sparsity, number of non-zero parameters, and amount of training data.
arXiv Detail & Related papers (2023-09-15T16:29:27Z) - Reproducible scaling laws for contrastive language-image learning [42.354402731615444]
We investigate scaling laws for contrastive language-image pre-training (CLIP) with the public LAION dataset and the open-source OpenCLIP repository.
Our large-scale experiments involve models trained on up to two billion image-text pairs and identify power law scaling for multiple downstream tasks.
We find that the training distribution plays a key role in scaling laws as the OpenAI and OpenCLIP models exhibit different scaling behavior despite identical model architectures.
arXiv Detail & Related papers (2022-12-14T10:24:50Z) - A Solvable Model of Neural Scaling Laws [72.8349503901712]
Large language models with a huge number of parameters, when trained on near internet-sized number of tokens, have been empirically shown to obey neural scaling laws.
We propose a statistical model -- a joint generative data model and random feature model -- that captures this neural scaling phenomenology.
Key findings are the manner in which the power laws that occur in the statistics of natural datasets are extended by nonlinear random feature maps.
arXiv Detail & Related papers (2022-10-30T15:13:18Z) - Understanding Scaling Laws for Recommendation Models [1.6283945233720964]
We study empirical scaling laws for DLRM style recommendation models, in particular Click-Through Rate (CTR)
We characterize scaling efficiency along three different resource dimensions, namely data, parameters and compute.
We show that parameter scaling is out of steam for the model architecture under study, and until a higher-performing model architecture emerges, data scaling is the path forward.
arXiv Detail & Related papers (2022-08-17T19:13:17Z) - Scaling Laws for Neural Language Models [14.472857826717613]
We study scaling laws for language model performance on the cross-entropy loss.
The loss scales as a power-law with model size, dataset size, and the amount of compute used for training.
arXiv Detail & Related papers (2020-01-23T03:59:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.