A Survey of Learning Curves with Bad Behavior: or How More Data Need Not
Lead to Better Performance
- URL: http://arxiv.org/abs/2211.14061v1
- Date: Fri, 25 Nov 2022 12:36:52 GMT
- Title: A Survey of Learning Curves with Bad Behavior: or How More Data Need Not
Lead to Better Performance
- Authors: Marco Loog and Tom Viering
- Abstract summary: Plotting a learner's generalization performance against a training set size results in a so-called learning curve.
We make the (ideal) learning curve concept precise and briefly discuss the aforementioned usages of such curves.
The larger part of this survey's focus is on learning curves that show that more data does not necessarily lead to better generalization performance.
- Score: 15.236871820889345
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Plotting a learner's generalization performance against the training set size
results in a so-called learning curve. This tool, providing insight in the
behavior of the learner, is also practically valuable for model selection,
predicting the effect of more training data, and reducing the computational
complexity of training. We set out to make the (ideal) learning curve concept
precise and briefly discuss the aforementioned usages of such curves. The
larger part of this survey's focus, however, is on learning curves that show
that more data does not necessarily leads to better generalization performance.
A result that seems surprising to many researchers in the field of artificial
intelligence. We point out the significance of these findings and conclude our
survey with an overview and discussion of open problems in this area that
warrant further theoretical and empirical investigation.
Related papers
- Continual Learning on a Data Diet [3.73232466691291]
Continual Learning (CL) methods usually learn from all available data.
Not all data points in a dataset have equal potential; some can be more informative than others.
This disparity may significantly impact the performance, as both the quality and quantity of samples directly influence the model's generalizability and efficiency.
arXiv Detail & Related papers (2024-10-23T09:42:17Z) - Granularity Matters in Long-Tail Learning [62.30734737735273]
We offer a novel perspective on long-tail learning, inspired by an observation: datasets with finer granularity tend to be less affected by data imbalance.
We introduce open-set auxiliary classes that are visually similar to existing ones, aiming to enhance representation learning for both head and tail classes.
To prevent the overwhelming presence of auxiliary classes from disrupting training, we introduce a neighbor-silencing loss.
arXiv Detail & Related papers (2024-10-21T13:06:21Z) - What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights [67.72413262980272]
Severe data imbalance naturally exists among web-scale vision-language datasets.
We find CLIP pre-trained thereupon exhibits notable robustness to the data imbalance compared to supervised learning.
The robustness and discriminability of CLIP improve with more descriptive language supervision, larger data scale, and broader open-world concepts.
arXiv Detail & Related papers (2024-05-31T17:57:24Z) - An Expert's Guide to Training Physics-informed Neural Networks [5.198985210238479]
Physics-informed neural networks (PINNs) have been popularized as a deep learning framework.
PINNs can seamlessly synthesize observational data and partial differential equation (PDE) constraints.
We present a series of best practices that can significantly improve the training efficiency and overall accuracy of PINNs.
arXiv Detail & Related papers (2023-08-16T16:19:25Z) - Estimation of Predictive Performance in High-Dimensional Data Settings
using Learning Curves [0.0]
Learn2Evaluate is based on learning curves by fitting a smooth monotone curve depicting test performance as a function of the sample size.
The benefits of Learn2Evaluate are illustrated by a simulation study and applications to omics data.
arXiv Detail & Related papers (2022-06-08T11:48:01Z) - Improved Fine-tuning by Leveraging Pre-training Data: Theory and
Practice [52.11183787786718]
Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications.
Recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy.
We propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task.
arXiv Detail & Related papers (2021-11-24T06:18:32Z) - The Shape of Learning Curves: a Review [14.764764847928259]
This review recounts the origins of the term, provides a formal definition of the learning curve, and briefly covers basics such as its estimation.
We discuss empirical and theoretical evidence that supports well-behaved curves that often have the shape of a power law or an exponential.
We draw specific attention to examples of learning curves that are ill-behaved, showing worse learning performance with more training data.
arXiv Detail & Related papers (2021-03-19T17:56:33Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z) - On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation.
We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.