Related papers: The Shape of Learning Curves: a Review

The Shape of Learning Curves: a Review

URL: http://arxiv.org/abs/2103.10948v1
Date: Fri, 19 Mar 2021 17:56:33 GMT
Title: The Shape of Learning Curves: a Review
Authors: Tom Viering, Marco Loog
Abstract summary: This review recounts the origins of the term, provides a formal definition of the learning curve, and briefly covers basics such as its estimation. We discuss empirical and theoretical evidence that supports well-behaved curves that often have the shape of a power law or an exponential. We draw specific attention to examples of learning curves that are ill-behaved, showing worse learning performance with more training data.
Score: 14.764764847928259
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning curves provide insight into the dependence of a learner's generalization performance on the training set size. This important tool can be used for model selection, to predict the effect of more training data, and to reduce the computational complexity of model training and hyperparameter tuning. This review recounts the origins of the term, provides a formal definition of the learning curve, and briefly covers basics such as its estimation. Our main contribution is a comprehensive overview of the literature regarding the shape of learning curves. We discuss empirical and theoretical evidence that supports well-behaved curves that often have the shape of a power law or an exponential. We consider the learning curves of Gaussian processes, the complex shapes they can display, and the factors influencing them. We draw specific attention to examples of learning curves that are ill-behaved, showing worse learning performance with more training data. To wrap up, we point out various open problems that warrant deeper empirical and theoretical investigation. All in all, our review underscores that learning curves are surprisingly diverse and no universal model can be identified.

Related papers

Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks [59.552873049024775]
We show that compute-optimally trained models exhibit a remarkably precise universality.<n>With learning rate decay, the collapse becomes so tight that differences in the normalized curves across models fall below the noise floor.<n>We explain these phenomena by connecting collapse to the power-law structure in typical neural scaling laws.
arXiv Detail & Related papers (2025-07-02T20:03:34Z)
LCDB 1.1: A Database Illustrating Learning Curves Are More Ill-Behaved Than Previously Thought [11.282804463462165]
We show that learning curves are less often well-behaved than previously thought.<n>Using statistically rigorous methods, we observe significant ill-behavior in approximately 14% of the learning curves.<n>We identify which learners are to blame and show that specific learners are more ill-behaved than others.
arXiv Detail & Related papers (2025-05-21T15:32:42Z)
In-Context Linear Regression Demystified: Training Dynamics and Mechanistic Interpretability of Multi-Head Softmax Attention [52.159541540613915]
We study how multi-head softmax attention models are trained to perform in-context learning on linear data. Our results reveal that in-context learning ability emerges from the trained transformer as an aggregated effect of its architecture and the underlying data distribution.
arXiv Detail & Related papers (2025-03-17T02:00:49Z)
Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network. Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z)
On the Foundations of Shortcut Learning [20.53986437152018]
We study how predictivity and availability interact to shape models' feature use. We find that linear models are relatively unbiased, but introducing a single hidden layer with ReLU or Tanh units yields a bias.
arXiv Detail & Related papers (2023-10-24T22:54:05Z)
On the Joint Interaction of Models, Data, and Features [82.60073661644435]
We introduce a new tool, the interaction tensor, for empirically analyzing the interaction between data and model through features. Based on these observations, we propose a conceptual framework for feature learning. Under this framework, the expected accuracy for a single hypothesis and agreement for a pair of hypotheses can both be derived in closed-form.
arXiv Detail & Related papers (2023-06-07T21:35:26Z)
When Do Curricula Work in Federated Learning? [56.88941905240137]
We find that curriculum learning largely alleviates non-IIDness. The more disparate the data distributions across clients the more they benefit from learning. We propose a novel client selection technique that benefits from the real-world disparity in the clients.
arXiv Detail & Related papers (2022-12-24T11:02:35Z)
A Survey of Learning Curves with Bad Behavior: or How More Data Need Not Lead to Better Performance [15.236871820889345]
Plotting a learner's generalization performance against a training set size results in a so-called learning curve. We make the (ideal) learning curve concept precise and briefly discuss the aforementioned usages of such curves. The larger part of this survey's focus is on learning curves that show that more data does not necessarily lead to better generalization performance.
arXiv Detail & Related papers (2022-11-25T12:36:52Z)
Beyond spectral gap: The role of the topology in decentralized learning [58.48291921602417]
In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model. This paper aims to paint an accurate picture of sparsely-connected distributed optimization when workers share the same data distribution. Our theory matches empirical observations in deep learning, and accurately describes the relative merits of different graph topologies.
arXiv Detail & Related papers (2022-06-07T08:19:06Z)
Learning Curves for Decision Making in Supervised Machine Learning -- A Survey [9.994200032442413]
Learning curves are a concept from social sciences that has been adopted in the context of machine learning. We contribute a framework that categorizes learning curve approaches using three criteria: the decision situation that they address, the intrinsic learning curve question that they answer and the type of resources that they use.
arXiv Detail & Related papers (2022-01-28T14:34:32Z)
Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning. Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z)
Distilling Interpretable Models into Human-Readable Code [71.11328360614479]
Human-readability is an important and desirable standard for machine-learned model interpretability. We propose to train interpretable models using conventional methods, and then distill them into concise, human-readable code. We describe a piecewise-linear curve-fitting algorithm that produces high-quality results efficiently and reliably across a broad range of use cases.
arXiv Detail & Related papers (2021-01-21T01:46:36Z)
A Theory of Universal Learning [26.51949485387526]
We show that there are only three possible rates of universal learning. We show that the learning curves of any given concept class decay either at an exponential, or arbitrarily slow rates.
arXiv Detail & Related papers (2020-11-09T15:10:32Z)
Learning Curves for Analysis of Deep Networks [23.968036672913392]
Learning curves can be used to select model parameters and extrapolate performance. We propose a method to robustly estimate learning curves, abstract their parameters into error and data-reliance, and evaluate the effectiveness of different parameterizations.
arXiv Detail & Related papers (2020-10-21T14:20:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.