The Shape of Learning Curves: a Review
- URL: http://arxiv.org/abs/2103.10948v1
- Date: Fri, 19 Mar 2021 17:56:33 GMT
- Title: The Shape of Learning Curves: a Review
- Authors: Tom Viering, Marco Loog
- Abstract summary: This review recounts the origins of the term, provides a formal definition of the learning curve, and briefly covers basics such as its estimation.
We discuss empirical and theoretical evidence that supports well-behaved curves that often have the shape of a power law or an exponential.
We draw specific attention to examples of learning curves that are ill-behaved, showing worse learning performance with more training data.
- Score: 14.764764847928259
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning curves provide insight into the dependence of a learner's
generalization performance on the training set size. This important tool can be
used for model selection, to predict the effect of more training data, and to
reduce the computational complexity of model training and hyperparameter
tuning. This review recounts the origins of the term, provides a formal
definition of the learning curve, and briefly covers basics such as its
estimation. Our main contribution is a comprehensive overview of the literature
regarding the shape of learning curves. We discuss empirical and theoretical
evidence that supports well-behaved curves that often have the shape of a power
law or an exponential. We consider the learning curves of Gaussian processes,
the complex shapes they can display, and the factors influencing them. We draw
specific attention to examples of learning curves that are ill-behaved, showing
worse learning performance with more training data. To wrap up, we point out
various open problems that warrant deeper empirical and theoretical
investigation. All in all, our review underscores that learning curves are
surprisingly diverse and no universal model can be identified.
Related papers
- Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network.
Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z) - On the Foundations of Shortcut Learning [20.53986437152018]
We study how predictivity and availability interact to shape models' feature use.
We find that linear models are relatively unbiased, but introducing a single hidden layer with ReLU or Tanh units yields a bias.
arXiv Detail & Related papers (2023-10-24T22:54:05Z) - On the Joint Interaction of Models, Data, and Features [82.60073661644435]
We introduce a new tool, the interaction tensor, for empirically analyzing the interaction between data and model through features.
Based on these observations, we propose a conceptual framework for feature learning.
Under this framework, the expected accuracy for a single hypothesis and agreement for a pair of hypotheses can both be derived in closed-form.
arXiv Detail & Related papers (2023-06-07T21:35:26Z) - When Do Curricula Work in Federated Learning? [56.88941905240137]
We find that curriculum learning largely alleviates non-IIDness.
The more disparate the data distributions across clients the more they benefit from learning.
We propose a novel client selection technique that benefits from the real-world disparity in the clients.
arXiv Detail & Related papers (2022-12-24T11:02:35Z) - A Survey of Learning Curves with Bad Behavior: or How More Data Need Not
Lead to Better Performance [15.236871820889345]
Plotting a learner's generalization performance against a training set size results in a so-called learning curve.
We make the (ideal) learning curve concept precise and briefly discuss the aforementioned usages of such curves.
The larger part of this survey's focus is on learning curves that show that more data does not necessarily lead to better generalization performance.
arXiv Detail & Related papers (2022-11-25T12:36:52Z) - Beyond spectral gap: The role of the topology in decentralized learning [58.48291921602417]
In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model.
This paper aims to paint an accurate picture of sparsely-connected distributed optimization when workers share the same data distribution.
Our theory matches empirical observations in deep learning, and accurately describes the relative merits of different graph topologies.
arXiv Detail & Related papers (2022-06-07T08:19:06Z) - Learning Curves for Decision Making in Supervised Machine Learning -- A
Survey [9.994200032442413]
Learning curves are a concept from social sciences that has been adopted in the context of machine learning.
We contribute a framework that categorizes learning curve approaches using three criteria: the decision situation that they address, the intrinsic learning curve question that they answer and the type of resources that they use.
arXiv Detail & Related papers (2022-01-28T14:34:32Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - Distilling Interpretable Models into Human-Readable Code [71.11328360614479]
Human-readability is an important and desirable standard for machine-learned model interpretability.
We propose to train interpretable models using conventional methods, and then distill them into concise, human-readable code.
We describe a piecewise-linear curve-fitting algorithm that produces high-quality results efficiently and reliably across a broad range of use cases.
arXiv Detail & Related papers (2021-01-21T01:46:36Z) - A Theory of Universal Learning [26.51949485387526]
We show that there are only three possible rates of universal learning.
We show that the learning curves of any given concept class decay either at an exponential, or arbitrarily slow rates.
arXiv Detail & Related papers (2020-11-09T15:10:32Z) - Learning Curves for Analysis of Deep Networks [23.968036672913392]
Learning curves can be used to select model parameters and extrapolate performance.
We propose a method to robustly estimate learning curves, abstract their parameters into error and data-reliance, and evaluate the effectiveness of different parameterizations.
arXiv Detail & Related papers (2020-10-21T14:20:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.