Scaling laws for single-agent reinforcement learning
- URL: http://arxiv.org/abs/2301.13442v1
- Date: Tue, 31 Jan 2023 06:38:53 GMT
- Title: Scaling laws for single-agent reinforcement learning
- Authors: Jacob Hilton, Jie Tang, John Schulman
- Abstract summary: We introduce *intrinsic performance*, a monotonic function of the return defined as the minimum compute required to achieve the given return.
We find that, across a range of environments, intrinsic performance scales as a power law in model size and environment interactions.
In particular, using a toy MNIST-based environment, we show that varying the "horizon length" of the task mostly changes the coefficient but not the exponent of this relationship.
- Score: 27.86599085479941
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work has shown that, in generative modeling, cross-entropy loss
improves smoothly with model size and training compute, following a power law
plus constant scaling law. One challenge in extending these results to
reinforcement learning is that the main performance objective of interest, mean
episode return, need not vary smoothly. To overcome this, we introduce
*intrinsic performance*, a monotonic function of the return defined as the
minimum compute required to achieve the given return across a family of models
of different sizes. We find that, across a range of environments, intrinsic
performance scales as a power law in model size and environment interactions.
Consequently, as in generative modeling, the optimal model size scales as a
power law in the training compute budget. Furthermore, we study how this
relationship varies with the environment and with other properties of the
training setup. In particular, using a toy MNIST-based environment, we show
that varying the "horizon length" of the task mostly changes the coefficient
but not the exponent of this relationship.
Related papers
- Scaling Laws for Pre-training Agents and World Models [22.701210075508147]
Performance of embodied agents has been shown to improve by increasing model parameters, dataset size, and compute.
This paper characterizes the role of scale in these tasks more precisely.
arXiv Detail & Related papers (2024-11-07T04:57:40Z) - Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)
After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.
We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z) - Observational Scaling Laws and the Predictability of Language Model Performance [51.2336010244645]
We propose an observational approach that bypasses model training and instead builds scaling laws from 100 publically available models.
We show that several emergent phenomena follow a smooth, sigmoidal behavior and are predictable from small models.
We show how to predict the impact of post-training interventions like Chain-of-Thought and Self-Consistency as language model capabilities continue to improve.
arXiv Detail & Related papers (2024-05-17T17:49:44Z) - Mixtures of Experts Unlock Parameter Scaling for Deep RL [54.26191237981469]
In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules into value-based networks results in more parameter-scalable models.
This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.
arXiv Detail & Related papers (2024-02-13T17:18:56Z) - A Dynamical Model of Neural Scaling Laws [79.59705237659547]
We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization.
Our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.
arXiv Detail & Related papers (2024-02-02T01:41:38Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Training Trajectories of Language Models Across Scales [99.38721327771208]
Scaling up language models has led to unprecedented performance gains.
How do language models of different sizes learn during pre-training?
Why do larger language models demonstrate more desirable behaviors?
arXiv Detail & Related papers (2022-12-19T19:16:29Z) - Scaling Laws for Acoustic Models [7.906034575114518]
Recent work has shown that autoregressive generative models with cross-entropy objective functions exhibit smooth power-law relationships.
We show that acoustic models trained with an auto-predictive coding loss behave as if they are subject to similar scaling laws.
arXiv Detail & Related papers (2021-06-11T18:59:24Z) - Scaling Laws for Neural Language Models [14.472857826717613]
We study scaling laws for language model performance on the cross-entropy loss.
The loss scales as a power-law with model size, dataset size, and the amount of compute used for training.
arXiv Detail & Related papers (2020-01-23T03:59:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.