Do deep neural networks have an inbuilt Occam's razor?
- URL: http://arxiv.org/abs/2304.06670v1
- Date: Thu, 13 Apr 2023 16:58:21 GMT
- Title: Do deep neural networks have an inbuilt Occam's razor?
- Authors: Chris Mingard and Henry Rees and Guillermo Valle-P\'erez and Ard A.
Louis
- Abstract summary: We show that structured data combined with an intrinsic Occam's razor-like inductive bias towards simple functions counteracts the exponential growth of functions with complexity.
This analysis reveals that structured data, combined with an intrinsic Occam's razor-like inductive bias towards (Kolmogorov) simple functions that is strong enough to counteract the exponential growth of functions with complexity, is a key to the success of DNNs.
- Score: 1.1470070927586016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The remarkable performance of overparameterized deep neural networks (DNNs)
must arise from an interplay between network architecture, training algorithms,
and structure in the data. To disentangle these three components, we apply a
Bayesian picture, based on the functions expressed by a DNN, to supervised
learning. The prior over functions is determined by the network, and is varied
by exploiting a transition between ordered and chaotic regimes. For Boolean
function classification, we approximate the likelihood using the error spectrum
of functions on data. When combined with the prior, this accurately predicts
the posterior, measured for DNNs trained with stochastic gradient descent. This
analysis reveals that structured data, combined with an intrinsic Occam's
razor-like inductive bias towards (Kolmogorov) simple functions that is strong
enough to counteract the exponential growth of the number of functions with
complexity, is a key to the success of DNNs.
Related papers
- Visualising Feature Learning in Deep Neural Networks by Diagonalizing the Forward Feature Map [4.776836972093627]
We present a method for analysing feature learning by decomposing deep neural networks (DNNs)
We find that DNNs converge to a minimal feature (MF) regime dominated by a number of eigenfunctions equal to the number of classes.
We recast the phenomenon of neural collapse into a kernel picture which can be extended to broader tasks such as regression.
arXiv Detail & Related papers (2024-10-05T18:53:48Z) - Deep Learning as Ricci Flow [38.27936710747996]
Deep neural networks (DNNs) are powerful tools for approximating the distribution of complex data.
We show that the transformations performed by DNNs during classification tasks have parallels to those expected under Hamilton's Ricci flow.
Our findings motivate the use of tools from differential and discrete geometry to the problem of explainability in deep learning.
arXiv Detail & Related papers (2024-04-22T15:12:47Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Provable Data Subset Selection For Efficient Neural Network Training [73.34254513162898]
We introduce the first algorithm to construct coresets for emphRBFNNs, i.e., small weighted subsets that approximate the loss of the input data on any radial basis function network.
We then perform empirical evaluations on function approximation and dataset subset selection on popular network architectures and data sets.
arXiv Detail & Related papers (2023-03-09T10:08:34Z) - Functional Neural Networks: Shift invariant models for functional data
with applications to EEG classification [0.0]
We introduce a new class of neural networks that are shift invariant and preserve smoothness of the data: functional neural networks (FNNs)
For this, we use methods from functional data analysis (FDA) to extend multi-layer perceptrons and convolutional neural networks to functional data.
We show that the models outperform a benchmark model from FDA in terms of accuracy and successfully use FNNs to classify electroencephalography (EEG) data.
arXiv Detail & Related papers (2023-01-14T09:41:21Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Dynamical systems' based neural networks [0.7874708385247353]
We build neural networks using a suitable, structure-preserving, numerical time-discretisation.
The structure of the neural network is then inferred from the properties of the ODE vector field.
We present two universal approximation results and demonstrate how to impose some particular properties on the neural networks.
arXiv Detail & Related papers (2022-10-05T16:30:35Z) - What Can Be Learnt With Wide Convolutional Neural Networks? [69.55323565255631]
We study infinitely-wide deep CNNs in the kernel regime.
We prove that deep CNNs adapt to the spatial scale of the target function.
We conclude by computing the generalisation error of a deep CNN trained on the output of another deep CNN.
arXiv Detail & Related papers (2022-08-01T17:19:32Z) - Deep Neural Network Classifier for Multi-dimensional Functional Data [4.340040784481499]
We propose a new approach, called as functional deep neural network (FDNN), for classifying multi-dimensional functional data.
Specifically, a deep neural network is trained based on the principle components of the training data which shall be used to predict the class label of a future data function.
arXiv Detail & Related papers (2022-05-17T19:22:48Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.