Learning Capacity: A Measure of the Effective Dimensionality of a Model
- URL: http://arxiv.org/abs/2305.17332v1
- Date: Sat, 27 May 2023 02:27:27 GMT
- Title: Learning Capacity: A Measure of the Effective Dimensionality of a Model
- Authors: Daiwei Chen, Weikai Chang, Pratik Chaudhari
- Abstract summary: We define a "learning capacity" which is a measure of the effective dimensionality of a model.
We show that the learning capacity is a tiny fraction of the number of parameters for many deep networks trained on typical datasets.
- Score: 16.225020457496434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We exploit a formal correspondence between thermodynamics and inference,
where the number of samples can be thought of as the inverse temperature, to
define a "learning capacity'' which is a measure of the effective
dimensionality of a model. We show that the learning capacity is a tiny
fraction of the number of parameters for many deep networks trained on typical
datasets, depends upon the number of samples used for training, and is
numerically consistent with notions of capacity obtained from the PAC-Bayesian
framework. The test error as a function of the learning capacity does not
exhibit double descent. We show that the learning capacity of a model saturates
at very small and very large sample sizes; this provides guidelines, as to
whether one should procure more data or whether one should search for new
architectures, to improve performance. We show how the learning capacity can be
used to understand the effective dimensionality, even for non-parametric models
such as random forests and $k$-nearest neighbor classifiers.
Related papers
- An exactly solvable model for emergence and scaling laws [2.598133279943607]
We present a framework where each new ability (a skill) is represented as a basis function.
We find analytic expressions for the emergence of new skills, as well as for scaling laws of the loss with training time, data size, model size, and optimal compute.
Our simple model captures, using a single fit parameter, the sigmoidal emergence of multiple new skills as training time, data size or model size increases in the neural network.
arXiv Detail & Related papers (2024-04-26T17:45:32Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Siamese Networks for Weakly Supervised Human Activity Recognition [2.398608007786179]
We present a model with multiple siamese networks that are trained by using only the information about the similarity between pairs of data samples without knowing the explicit labels.
The trained model maps the activity data samples into fixed size representation vectors such that the distance between the vectors in the representation space approximates the similarity of the data samples in the input space.
We evaluate the model on three datasets to verify its effectiveness in segmentation and recognition of continuous human activity sequences.
arXiv Detail & Related papers (2023-07-18T03:23:34Z) - Evaluating Representations with Readout Model Switching [18.475866691786695]
In this paper, we propose to use the Minimum Description Length (MDL) principle to devise an evaluation metric.
We design a hybrid discrete and continuous-valued model space for the readout models and employ a switching strategy to combine their predictions.
The proposed metric can be efficiently computed with an online method and we present results for pre-trained vision encoders of various architectures.
arXiv Detail & Related papers (2023-02-19T14:08:01Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Learning new physics efficiently with nonparametric methods [11.970219534238444]
We present a machine learning approach for model-independent new physics searches.
The corresponding algorithm is powered by recent large-scale implementations of kernel methods.
We show that our approach has dramatic advantages compared to neural network implementations in terms of training times and computational resources.
arXiv Detail & Related papers (2022-04-05T16:17:59Z) - Feeding What You Need by Understanding What You Learned [54.400455868448695]
Machine Reading (MRC) reveals the ability to understand a given text passage and answer questions based on it.
Existing research works in MRC rely heavily on large-size models and corpus to improve the performance evaluated by metrics such as Exact Match.
We argue that a deep understanding of model capabilities and data properties can help us feed a model with appropriate training data.
arXiv Detail & Related papers (2022-03-05T14:15:59Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Estimating informativeness of samples with Smooth Unique Information [108.25192785062367]
We measure how much a sample informs the final weights and how much it informs the function computed by the weights.
We give efficient approximations of these quantities using a linearized network.
We apply these measures to several problems, such as dataset summarization.
arXiv Detail & Related papers (2021-01-17T10:29:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.