Small Data, Big Decisions: Model Selection in the Small-Data Regime
- URL: http://arxiv.org/abs/2009.12583v1
- Date: Sat, 26 Sep 2020 12:52:56 GMT
- Title: Small Data, Big Decisions: Model Selection in the Small-Data Regime
- Authors: Jorg Bornschein, Francesco Visin, Simon Osindero
- Abstract summary: We study the generalization performance as the size of the training set varies over multiple orders of magnitude.
Our experiments furthermore allow us to estimate Minimum Description Lengths for common datasets given modern neural network architectures.
- Score: 11.817454285986225
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Highly overparametrized neural networks can display curiously strong
generalization performance - a phenomenon that has recently garnered a wealth
of theoretical and empirical research in order to better understand it. In
contrast to most previous work, which typically considers the performance as a
function of the model size, in this paper we empirically study the
generalization performance as the size of the training set varies over multiple
orders of magnitude. These systematic experiments lead to some interesting and
potentially very useful observations; perhaps most notably that training on
smaller subsets of the data can lead to more reliable model selection decisions
whilst simultaneously enjoying smaller computational costs. Our experiments
furthermore allow us to estimate Minimum Description Lengths for common
datasets given modern neural network architectures, thereby paving the way for
principled model selection taking into account Occams-razor.
Related papers
- Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data.
Main aim of the identified model is to predict new data from previous observations.
We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z) - Zero-shot meta-learning for small-scale data from human subjects [10.320654885121346]
We develop a framework to rapidly adapt to a new prediction task with limited training data for out-of-sample test data.
Our model learns the latent treatment effects of each intervention and, by design, can naturally handle multi-task predictions.
Our model has implications for improved generalization of small-size human studies to the wider population.
arXiv Detail & Related papers (2022-03-29T17:42:04Z) - On Optimal Early Stopping: Over-informative versus Under-informative
Parametrization [13.159777131162961]
We develop theoretical results to reveal the relationship between the optimal early stopping time and model dimension.
We demonstrate experimentally that our theoretical results on optimal early stopping time corresponds to the training process of deep neural networks.
arXiv Detail & Related papers (2022-02-20T18:20:06Z) - With Greater Distance Comes Worse Performance: On the Perspective of
Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning.
Early layers generally learn representations relevant to performance on both training data and testing data.
Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z) - Leveraging the structure of dynamical systems for data-driven modeling [111.45324708884813]
We consider the impact of the training set and its structure on the quality of the long-term prediction.
We show how an informed design of the training set, based on invariants of the system and the structure of the underlying attractor, significantly improves the resulting models.
arXiv Detail & Related papers (2021-12-15T20:09:20Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Experimental Design for Overparameterized Learning with Application to
Single Shot Deep Active Learning [5.141687309207561]
Modern machine learning models are trained on large amounts of labeled data.
Access to large volumes of labeled data is often limited or expensive.
We propose a new design strategy for curating the training set.
arXiv Detail & Related papers (2020-09-27T11:27:49Z) - Rethinking Generalization of Neural Models: A Named Entity Recognition
Case Study [81.11161697133095]
We take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives.
Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models.
As a by-product of this paper, we have open-sourced a project that involves a comprehensive summary of recent NER papers.
arXiv Detail & Related papers (2020-01-12T04:33:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.