Bayes Complexity of Learners vs Overfitting
- URL: http://arxiv.org/abs/2303.07874v1
- Date: Mon, 13 Mar 2023 13:07:02 GMT
- Title: Bayes Complexity of Learners vs Overfitting
- Authors: Grzegorz G{\l}uch, Rudiger Urbanke
- Abstract summary: We show that a new notion of complexity of functions governs a PAC Bayes-like generalization bound.
In contrast to previous works, our notion naturally generalizes to neural networks with several layers.
An upper-bound we derive allows to show a separation in the number of samples needed for good generalization between 2 and 4-layer neural networks.
- Score: 4.873362301533825
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a new notion of complexity of functions and we show that it has
the following properties: (i) it governs a PAC Bayes-like generalization bound,
(ii) for neural networks it relates to natural notions of complexity of
functions (such as the variation), and (iii) it explains the generalization gap
between neural networks and linear schemes. While there is a large set of
papers which describes bounds that have each such property in isolation, and
even some that have two, as far as we know, this is a first notion that
satisfies all three of them. Moreover, in contrast to previous works, our
notion naturally generalizes to neural networks with several layers.
Even though the computation of our complexity is nontrivial in general, an
upper-bound is often easy to derive, even for higher number of layers and
functions with structure, such as period functions. An upper-bound we derive
allows to show a separation in the number of samples needed for good
generalization between 2 and 4-layer neural networks for periodic functions.
Related papers
- Spectral complexity of deep neural networks [2.099922236065961]
We use the angular power spectrum of the limiting field to characterize the complexity of the network architecture.
On this basis, we classify neural networks as low-disorder, sparse, or high-disorder.
We show how this classification highlights a number of distinct features for standard activation functions, and in particular, sparsity properties of ReLU networks.
arXiv Detail & Related papers (2024-05-15T17:55:05Z) - Structure of universal formulas [13.794391803767617]
We introduce a hierarchy of classes connecting the global approximability property to the weaker property of infinite VC dimension.
We show that fixed-size neural networks with not more than one layer of neurons having activations cannot approximate functions on arbitrary finite sets.
We give examples of functional families, including two-hidden-layer neural networks, that approximate functions on arbitrary finite sets, but fail to do that on the whole domain of definition.
arXiv Detail & Related papers (2023-11-07T11:50:25Z) - Going Beyond Neural Network Feature Similarity: The Network Feature
Complexity and Its Interpretation Using Category Theory [64.06519549649495]
We provide the definition of what we call functionally equivalent features.
These features produce equivalent output under certain transformations.
We propose an efficient algorithm named Iterative Feature Merging.
arXiv Detail & Related papers (2023-10-10T16:27:12Z) - Memorization with neural nets: going beyond the worst case [5.662924503089369]
In practice, deep neural networks are often able to easily interpolate their training data.
For real-world data, however, one intuitively expects the presence of a benign structure so that already occurs at a smaller network size than suggested by memorization capacity.
We introduce a simple randomized algorithm that, given a fixed finite dataset with two classes, with high probability constructs an interpolating three-layer neural network in time.
arXiv Detail & Related papers (2023-09-30T10:06:05Z) - Data Topology-Dependent Upper Bounds of Neural Network Widths [52.58441144171022]
We first show that a three-layer neural network can be designed to approximate an indicator function over a compact set.
This is then extended to a simplicial complex, deriving width upper bounds based on its topological structure.
We prove the universal approximation property of three-layer ReLU networks using our topological approach.
arXiv Detail & Related papers (2023-05-25T14:17:15Z) - A Measure of the Complexity of Neural Representations based on Partial
Information Decomposition [0.0]
In neural networks, task-relevant information is represented jointly by groups of neurons.
We show how Partial Information Decomposition (PID), a recent extension of information theory, can disentangle these different contributions.
We introduce the measure of "Representational Complexity", which quantifies the difficulty of accessing information spread across multiple neurons.
arXiv Detail & Related papers (2022-09-21T15:33:11Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - On the Expected Complexity of Maxout Networks [0.0]
Recent works have shown that the practical complexity of deep ReLU networks is often far from the theoretical maximum.
In this work we show that this phenomenon also occurs in networks with maxout (multi-argument) activation functions.
We also show that the parameter space has a multitude of full-dimensional regions with widely different complexity, and obtain nontrivial lower bounds on the expected complexity.
arXiv Detail & Related papers (2021-07-01T11:36:32Z) - On Function Approximation in Reinforcement Learning: Optimism in the
Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning.
In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function.
Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z) - Measuring Model Complexity of Neural Networks with Curve Activation
Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function.
We experimentally explore the training process of neural networks and detect overfitting.
We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.