Related papers: Do deep neural networks have an inbuilt Occam's razor?

Do deep neural networks have an inbuilt Occam's razor?

URL: http://arxiv.org/abs/2304.06670v1
Date: Thu, 13 Apr 2023 16:58:21 GMT
Title: Do deep neural networks have an inbuilt Occam's razor?
Authors: Chris Mingard and Henry Rees and Guillermo Valle-P\'erez and Ard A. Louis
Abstract summary: We show that structured data combined with an intrinsic Occam's razor-like inductive bias towards simple functions counteracts the exponential growth of functions with complexity. This analysis reveals that structured data, combined with an intrinsic Occam's razor-like inductive bias towards (Kolmogorov) simple functions that is strong enough to counteract the exponential growth of functions with complexity, is a key to the success of DNNs.
Score: 1.1470070927586016
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The remarkable performance of overparameterized deep neural networks (DNNs) must arise from an interplay between network architecture, training algorithms, and structure in the data. To disentangle these three components, we apply a Bayesian picture, based on the functions expressed by a DNN, to supervised learning. The prior over functions is determined by the network, and is varied by exploiting a transition between ordered and chaotic regimes. For Boolean function classification, we approximate the likelihood using the error spectrum of functions on data. When combined with the prior, this accurately predicts the posterior, measured for DNNs trained with stochastic gradient descent. This analysis reveals that structured data, combined with an intrinsic Occam's razor-like inductive bias towards (Kolmogorov) simple functions that is strong enough to counteract the exponential growth of the number of functions with complexity, is a key to the success of DNNs.

Related papers

Visualising Feature Learning in Deep Neural Networks by Diagonalizing the Forward Feature Map [4.776836972093627]
We present a method for analysing feature learning by decomposing deep neural networks (DNNs) We find that DNNs converge to a minimal feature (MF) regime dominated by a number of eigenfunctions equal to the number of classes. We recast the phenomenon of neural collapse into a kernel picture which can be extended to broader tasks such as regression.
arXiv Detail & Related papers (2024-10-05T18:53:48Z)
Deep Learning as Ricci Flow [38.27936710747996]
Deep neural networks (DNNs) are powerful tools for approximating the distribution of complex data. We show that the transformations performed by DNNs during classification tasks have parallels to those expected under Hamilton's Ricci flow. Our findings motivate the use of tools from differential and discrete geometry to the problem of explainability in deep learning.
arXiv Detail & Related papers (2024-04-22T15:12:47Z)
How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series. We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z)
Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification. Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z)
Provable Data Subset Selection For Efficient Neural Network Training [73.34254513162898]
We introduce the first algorithm to construct coresets for emphRBFNNs, i.e., small weighted subsets that approximate the loss of the input data on any radial basis function network. We then perform empirical evaluations on function approximation and dataset subset selection on popular network architectures and data sets.
arXiv Detail & Related papers (2023-03-09T10:08:34Z)
An Order-Invariant and Interpretable Hierarchical Dilated Convolution Neural Network for Chemical Fault Detection and Diagnosis [7.226239130399725]
Convolution neural network (CNN) is a popular deep learning algorithm with many successful applications in chemical fault detection and diagnosis tasks. In this paper, we propose an order-invariant and interpretable hierarchical dilated convolution neural network (HDLCNN) The proposed method provides interpretability by including the SHAP values to quantify feature contribution.
arXiv Detail & Related papers (2023-02-13T10:28:41Z)
Functional Neural Networks: Shift invariant models for functional data with applications to EEG classification [0.0]
We introduce a new class of neural networks that are shift invariant and preserve smoothness of the data: functional neural networks (FNNs) For this, we use methods from functional data analysis (FDA) to extend multi-layer perceptrons and convolutional neural networks to functional data. We show that the models outperform a benchmark model from FDA in terms of accuracy and successfully use FNNs to classify electroencephalography (EEG) data.
arXiv Detail & Related papers (2023-01-14T09:41:21Z)
Neural networks trained with SGD learn distributions of increasing complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics. We then exploit higher-order statistics only later during training. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z)
Dynamical systems' based neural networks [0.7874708385247353]
We build neural networks using a suitable, structure-preserving, numerical time-discretisation. The structure of the neural network is then inferred from the properties of the ODE vector field. We present two universal approximation results and demonstrate how to impose some particular properties on the neural networks.
arXiv Detail & Related papers (2022-10-05T16:30:35Z)
What Can Be Learnt With Wide Convolutional Neural Networks? [69.55323565255631]
We study infinitely-wide deep CNNs in the kernel regime. We prove that deep CNNs adapt to the spatial scale of the target function. We conclude by computing the generalisation error of a deep CNN trained on the output of another deep CNN.
arXiv Detail & Related papers (2022-08-01T17:19:32Z)
Deep Neural Network Classifier for Multi-dimensional Functional Data [4.340040784481499]
We propose a new approach, called as functional deep neural network (FDNN), for classifying multi-dimensional functional data. Specifically, a deep neural network is trained based on the principle components of the training data which shall be used to predict the class label of a future data function.
arXiv Detail & Related papers (2022-05-17T19:22:48Z)
Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training. We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Deep neural networks for smooth approximation of physics with higher order and continuity B-spline base functions [0.4588028371034407]
Traditionally, the neural network employs non-linear activation functions to approximate a given physical phenomenon. We present an alternative approach, where the physcial quantity is approximated as a linear combination of smooth B-spline basis functions. We show that our approach is cheaper and more accurate when approximating physical fields.
arXiv Detail & Related papers (2022-01-03T23:02:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.