Neural Neural Scaling Laws
- URL: http://arxiv.org/abs/2601.19831v1
- Date: Tue, 27 Jan 2026 17:38:11 GMT
- Title: Neural Neural Scaling Laws
- Authors: Michael Y. Hu, Jane Pan, Ayush Rajesh Jhaveri, Nicholas Lourie, Kyunghyun Cho,
- Abstract summary: We propose Neural Neural Scaling Laws (NeuNeu), a neural network that frames scaling law prediction as time-series extrapolation.<n>NeuNeu achieves 2.04% mean absolute error in predicting model accuracy on 66 downstream tasks.<n>Our work suggests that predicting downstream scaling laws directly from data outperforms parametric alternatives.
- Score: 40.38002195911611
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural scaling laws predict how language model performance improves with increased compute. While aggregate metrics like validation loss can follow smooth power-law curves, individual downstream tasks exhibit diverse scaling behaviors: some improve monotonically, others plateau, and some even degrade with scale. We argue that predicting downstream performance from validation perplexity suffers from two limitations: averaging token-level losses obscures signal, and no simple parametric family can capture the full spectrum of scaling behaviors. To address this, we propose Neural Neural Scaling Laws (NeuNeu), a neural network that frames scaling law prediction as time-series extrapolation. NeuNeu combines temporal context from observed accuracy trajectories with token-level validation losses, learning to predict future performance without assuming any bottleneck or functional form. Trained entirely on open-source model checkpoints from HuggingFace, NeuNeu achieves 2.04% mean absolute error in predicting model accuracy on 66 downstream tasks -- a 38% reduction compared to logistic scaling laws (3.29% MAE). Furthermore, NeuNeu generalizes zero-shot to unseen model families, parameter counts, and downstream tasks. Our work suggests that predicting downstream scaling laws directly from data outperforms parametric alternatives.
Related papers
- Why Machine Learning Models Systematically Underestimate Extreme Values II: How to Fix It with LatentNN [0.2700171473617699]
Attenuation bias affects astronomical data-driven models.<n>We show that neural networks suffer from the same attenuation bias.<n>We introduce LatentNN, a method that jointly optimize network parameters and latent input values.
arXiv Detail & Related papers (2025-12-29T01:59:10Z) - Scaling Laws Are Unreliable for Downstream Tasks: A Reality Check [44.088564825871345]
Downstream scaling laws aim to predict task performance at larger scales from the model's performance at smaller scales.<n>We find that predictable scaling only occurs in a minority of cases.<n>Seemingly benign changes to the experimental setting can completely change the scaling behavior.
arXiv Detail & Related papers (2025-07-01T15:52:55Z) - Bayesian Neural Scaling Law Extrapolation with Prior-Data Fitted Networks [100.13335639780415]
Scaling laws often follow the power-law and proposed several variants of power-law functions to predict the scaling behavior at larger scales.<n>Existing methods mostly rely on point estimation and do not quantify uncertainty, which is crucial for real-world applications.<n>In this work, we explore a Bayesian framework based on Prior-data Fitted Networks (PFNs) for neural scaling law extrapolation.
arXiv Detail & Related papers (2025-05-29T03:19:17Z) - Information-Theoretic Foundations for Neural Scaling Laws [20.617552198581024]
We develop information-theoretic foundations for neural scaling laws.
We observe that the optimal relation between data and model size is linear, up to logarithmic factors.
arXiv Detail & Related papers (2024-06-28T02:20:54Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
We present a unifying perspective on recent results on ridge regression.<n>We use the basic tools of random matrix theory and free probability, aimed at readers with backgrounds in physics and deep learning.<n>Our results extend and provide a unifying perspective on earlier models of scaling laws.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - A Dynamical Model of Neural Scaling Laws [79.59705237659547]
We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization.
Our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.
arXiv Detail & Related papers (2024-02-02T01:41:38Z) - Predicting Emergent Abilities with Infinite Resolution Evaluation [85.89911520190711]
We introduce PassUntil, an evaluation strategy with theoretically infinite resolution, through massive sampling in the decoding phase.
We predict the performance of the 2.4B model on code generation with merely 0.05% deviation before training starts.
We identify a kind of accelerated emergence whose scaling curve cannot be fitted by standard scaling law function.
arXiv Detail & Related papers (2023-10-05T02:35:00Z) - A Solvable Model of Neural Scaling Laws [72.8349503901712]
Large language models with a huge number of parameters, when trained on near internet-sized number of tokens, have been empirically shown to obey neural scaling laws.
We propose a statistical model -- a joint generative data model and random feature model -- that captures this neural scaling phenomenology.
Key findings are the manner in which the power laws that occur in the statistics of natural datasets are extended by nonlinear random feature maps.
arXiv Detail & Related papers (2022-10-30T15:13:18Z) - Broken Neural Scaling Laws [9.020652910657931]
Broken Neural Scaling Law (BNSL) accurately models and extrapolates the scaling behaviors of deep neural networks.
This set includes large-scale vision, language, audio, video, diffusion, generative modeling, multimodal learning, contrastive learning, AI alignment, robotics, out-of-distribution (OOD) generalization.
arXiv Detail & Related papers (2022-10-26T17:45:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.