Learning Capacity: A Measure of the Effective Dimensionality of a Model
- URL: http://arxiv.org/abs/2305.17332v2
- Date: Sun, 20 Oct 2024 17:11:49 GMT
- Title: Learning Capacity: A Measure of the Effective Dimensionality of a Model
- Authors: Daiwei Chen, Wei-Kai Chang, Pratik Chaudhari,
- Abstract summary: We study a quantity called learning capacity'' which is a measure of the effective dimensionality of a model.
We show that the learning capacity is a useful notion of the complexity because (a) it correlates well with the test loss and it is a tiny fraction of the number of parameters for many deep networks trained on typical datasets.
- Score: 18.48866194756127
- License:
- Abstract: We use a formal correspondence between thermodynamics and inference, where the number of samples can be thought of as the inverse temperature, to study a quantity called ``learning capacity'' which is a measure of the effective dimensionality of a model. We show that the learning capacity is a useful notion of the complexity because (a) it correlates well with the test loss and it is a tiny fraction of the number of parameters for many deep networks trained on typical datasets, (b) it depends upon the number of samples used for training, (c) it is numerically consistent with notions of capacity obtained from PAC-Bayes generalization bounds, and (d) the test loss as a function of the learning capacity does not exhibit double descent. We show that the learning capacity saturates at very small and very large sample sizes; the threshold that characterizes the transition between these two regimes provides guidelines as to when one should procure more data and when one should search for a different architecture to improve performance. We show how the learning capacity can be used to provide a quantitative notion of capacity even for non-parametric models such as random forests and nearest neighbor classifiers.
Related papers
- Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)
After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.
We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z) - Collaborative Learning with Different Labeling Functions [7.228285747845779]
We study a variant of Collaborative PAC Learning, in which we aim to learn an accurate classifier for each of the $n$ data distributions.
We show that, when the data distributions satisfy a weaker realizability assumption, sample-efficient learning is still feasible.
arXiv Detail & Related papers (2024-02-16T04:32:22Z) - Stabilizing Subject Transfer in EEG Classification with Divergence
Estimation [17.924276728038304]
We propose several graphical models to describe an EEG classification task.
We identify statistical relationships that should hold true in an idealized training scenario.
We design regularization penalties to enforce these relationships in two stages.
arXiv Detail & Related papers (2023-10-12T23:06:52Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Optimal Learners for Realizable Regression: PAC Learning and Online Learning [52.37726841759983]
We aim to characterize the statistical complexity of realizable regression both in the PAC learning setting and the online learning setting.
We first introduce a minimax instance optimal learner for realizable regression and propose a novel dimension that both qualitatively and quantitatively characterizes which classes of real-valued predictors are learnable.
In the context of online learning we provide a dimension that characterizes the minimax optimal instance optimal cumulative loss up to a constant factor and design an optimal online learner for realizable regression.
arXiv Detail & Related papers (2023-07-07T21:39:25Z) - Learning Likelihood Ratios with Neural Network Classifiers [0.12277343096128711]
approximations of the likelihood ratio may be computed using clever parametrizations of neural network-based classifiers.
We present a series of empirical studies detailing the performance of several common loss functionals and parametrizations of the classifier output.
arXiv Detail & Related papers (2023-05-17T18:11:38Z) - Semiparametric Language Models Are Scalable Continual Learners [83.74414880208334]
Semiparametric language models (LMs) have shown promise in continuously learning from new text data.
We present a simple and intuitive approach called Selective Memorization (SeMem)
SeMem only memorizes difficult samples that the model is likely to struggle with.
arXiv Detail & Related papers (2023-03-02T17:15:02Z) - Pairwise Learning via Stagewise Training in Proximal Setting [0.0]
We combine adaptive sample size and importance sampling techniques for pairwise learning, with convergence guarantees for nonsmooth convex pairwise loss functions.
We demonstrate that sampling opposite instances at each reduces the variance of the gradient, hence accelerating convergence.
arXiv Detail & Related papers (2022-08-08T11:51:01Z) - Scalable approach to many-body localization via quantum data [69.3939291118954]
Many-body localization is a notoriously difficult phenomenon from quantum many-body physics.
We propose a flexible neural network based learning approach that circumvents any computationally expensive step.
Our approach can be applied to large-scale quantum experiments to provide new insights into quantum many-body physics.
arXiv Detail & Related papers (2022-02-17T19:00:09Z) - A Theoretical-Empirical Approach to Estimating Sample Complexity of DNNs [11.152761263415046]
This paper focuses on understanding how the generalization error scales with the amount of the training data for deep neural networks (DNNs)
We derive estimates of the generalization error that hold for deep networks and do not rely on unattainable capacity measures.
arXiv Detail & Related papers (2021-05-05T05:14:08Z) - Estimating informativeness of samples with Smooth Unique Information [108.25192785062367]
We measure how much a sample informs the final weights and how much it informs the function computed by the weights.
We give efficient approximations of these quantities using a linearized network.
We apply these measures to several problems, such as dataset summarization.
arXiv Detail & Related papers (2021-01-17T10:29:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.