Related papers: Scaling Laws for Uncertainty in Deep Learning

Scaling Laws for Uncertainty in Deep Learning

URL: http://arxiv.org/abs/2506.09648v1
Date: Wed, 11 Jun 2025 12:09:05 GMT
Title: Scaling Laws for Uncertainty in Deep Learning
Authors: Mattia Rosso, Simone Rossi, Giulio Franzese, Markus Heinonen, Maurizio Filippone,
Abstract summary: We show the existence of scaling laws associated with various measures of predictive uncertainty with respect to dataset and model sizes.<n>This work provides strong evidence to dispel recurring skepticism against Bayesian approaches.
Score: 18.87399857008617
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning has recently revealed the existence of scaling laws, demonstrating that model performance follows predictable trends based on dataset and model sizes. Inspired by these findings and fascinating phenomena emerging in the over-parameterized regime, we examine a parallel direction: do similar scaling laws govern predictive uncertainties in deep learning? In identifiable parametric models, such scaling laws can be derived in a straightforward manner by treating model parameters in a Bayesian way. In this case, for example, we obtain $O(1/N)$ contraction rates for epistemic uncertainty with respect to the number of data $N$. However, in over-parameterized models, these guarantees do not hold, leading to largely unexplored behaviors. In this work, we empirically show the existence of scaling laws associated with various measures of predictive uncertainty with respect to dataset and model sizes. Through experiments on vision and language tasks, we observe such scaling laws for in- and out-of-distribution predictive uncertainty estimated through popular approximate Bayesian inference and ensemble methods. Besides the elegance of scaling laws and the practical utility of extrapolating uncertainties to larger data or models, this work provides strong evidence to dispel recurring skepticism against Bayesian approaches: "In many applications of deep learning we have so much data available: what do we need Bayes for?". Our findings show that "so much data" is typically not enough to make epistemic uncertainty negligible.

Related papers

Bayesian Neural Scaling Law Extrapolation with Prior-Fitted Networks [100.13335639780415]
Scaling laws often follow the power-law and proposed several variants of power-law functions to predict the scaling behavior at larger scales.<n>Existing methods mostly rely on point estimation and do not quantify uncertainty, which is crucial for real-world applications.<n>In this work, we explore a Bayesian framework based on Prior-data Fitted Networks (PFNs) for neural scaling law extrapolation.
arXiv Detail & Related papers (2025-05-29T03:19:17Z)
Bayesian scaling laws for in-context learning [72.17734205418502]
In-context learning (ICL) is a powerful technique for getting language models to perform complex tasks with no training updates. We show that ICL approximates a Bayesian learner and develop a family of novel Bayesian scaling laws for ICL.
arXiv Detail & Related papers (2024-10-21T21:45:22Z)
A Simple Model of Inference Scaling Laws [1.3597551064547502]
We study scaling laws in the context of inference, specifically how performance improves with multiple inference attempts.<n>Our simple framework sets the ground for incorporating inference scaling with other known scaling laws.
arXiv Detail & Related papers (2024-10-21T18:00:06Z)
A Hitchhiker's Guide to Scaling Law Estimation [56.06982415792523]
Scaling laws predict the loss of a target machine learning model by extrapolating from easier-to-train models with fewer parameters or smaller training sets.<n>We estimate more than 1000 scaling laws, then derive a set of best practices for estimating scaling laws in new model families.
arXiv Detail & Related papers (2024-10-15T17:59:10Z)
Scaling Laws For Dense Retrieval [22.76001461620846]
We investigate whether the performance of dense retrieval models follows the scaling law as other neural models. Results indicate that, under our settings, the performance of dense retrieval models follows a precise power-law scaling related to the model size and the number of annotations.
arXiv Detail & Related papers (2024-03-27T15:27:36Z)
Predicting Emergent Abilities with Infinite Resolution Evaluation [85.89911520190711]
We introduce PassUntil, an evaluation strategy with theoretically infinite resolution, through massive sampling in the decoding phase. We predict the performance of the 2.4B model on code generation with merely 0.05% deviation before training starts. We identify a kind of accelerated emergence whose scaling curve cannot be fitted by standard scaling law function.
arXiv Detail & Related papers (2023-10-05T02:35:00Z)
A Solvable Model of Neural Scaling Laws [72.8349503901712]
Large language models with a huge number of parameters, when trained on near internet-sized number of tokens, have been empirically shown to obey neural scaling laws. We propose a statistical model -- a joint generative data model and random feature model -- that captures this neural scaling phenomenology. Key findings are the manner in which the power laws that occur in the statistics of natural datasets are extended by nonlinear random feature maps.
arXiv Detail & Related papers (2022-10-30T15:13:18Z)
BayesCap: Bayesian Identity Cap for Calibrated Uncertainty in Frozen Neural Networks [50.15201777970128]
We propose BayesCap that learns a Bayesian identity mapping for the frozen model, allowing uncertainty estimation. BayesCap is a memory-efficient method that can be trained on a small fraction of the original dataset. We show the efficacy of our method on a wide variety of tasks with a diverse set of architectures.
arXiv Detail & Related papers (2022-07-14T12:50:09Z)
Scaling Laws Under the Microscope: Predicting Transformer Performance from Small Scale Experiments [42.793379799720434]
We investigate whether scaling laws can be used to accelerate model development. We find that scaling laws emerge at finetuning time in some NLP tasks. For tasks where scaling laws exist, they can be used to predict the performance of larger models.
arXiv Detail & Related papers (2022-02-13T19:13:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.