Related papers: On Rademacher Complexity-based Generalization Bounds for Deep Learning

Related papers

A Margin-based Multiclass Generalization Bound via Geometric Complexity [6.554326244334867]
We investigate margin-based multiclass generalization bounds for neural networks. We derive a new upper bound on the generalization error which scales with the margin-normalized geometric complexity of the network.
arXiv Detail & Related papers (2024-05-28T21:08:58Z)
On The Expressivity of Recurrent Neural Cascades [48.87943990557107]
Recurrent Neural Cascades (RNCs) are the recurrent neural networks with no cyclic dependencies among recurrent neurons. We show that RNCs can achieve the expressivity of all regular languages by introducing neurons that can implement groups.
arXiv Detail & Related papers (2023-12-14T15:47:26Z)
Polyhedral Complex Extraction from ReLU Networks using Edge Subdivision [0.0]
A neural network consists of piecewise affine building blocks, such as fully-connected layers and ReLU activations. This complex has been previously studied to characterize theoretical properties of neural networks. We propose to subdivide the regions via intersections with hyperplanes induced by each neuron.
arXiv Detail & Related papers (2023-06-12T16:17:04Z)
On Size-Independent Sample Complexity of ReLU Networks [9.15749739027059]
We study the sample complexity of learning ReLU neural networks from the point of view of generalization. We estimate the Rademacher complexity of the associated function class.
arXiv Detail & Related papers (2023-06-03T03:41:33Z)
Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime. We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z)
PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization [48.26492774959634]
We develop a compression approach based on quantizing neural network parameters in a linear subspace. We find large models can be compressed to a much greater extent than previously known, encapsulating Occam's razor.
arXiv Detail & Related papers (2022-11-24T13:50:16Z)
Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training. We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z)
Graph-adaptive Rectified Linear Unit for Graph Neural Networks [64.92221119723048]
Graph Neural Networks (GNNs) have achieved remarkable success by extending traditional convolution to learning on non-Euclidean data. We propose Graph-adaptive Rectified Linear Unit (GReLU) which is a new parametric activation function incorporating the neighborhood information in a novel and efficient way. We conduct comprehensive experiments to show that our plug-and-play GReLU method is efficient and effective given different GNN backbones and various downstream tasks.
arXiv Detail & Related papers (2022-02-13T10:54:59Z)
Dist2Cycle: A Simplicial Neural Network for Homology Localization [66.15805004725809]
Simplicial complexes can be viewed as high dimensional generalizations of graphs that explicitly encode multi-way ordered relations. We propose a graph convolutional model for learning functions parametrized by the $k$-homological features of simplicial complexes.
arXiv Detail & Related papers (2021-10-28T14:59:41Z)
Path Regularization: A Convexity and Sparsity Inducing Regularization for Parallel ReLU Networks [75.33431791218302]
We study the training problem of deep neural networks and introduce an analytic approach to unveil hidden convexity in the optimization landscape. We consider a deep parallel ReLU network architecture, which also includes standard deep networks and ResNets as its special cases.
arXiv Detail & Related papers (2021-10-18T18:00:36Z)
The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z)
On the Expected Complexity of Maxout Networks [0.0]
Recent works have shown that the practical complexity of deep ReLU networks is often far from the theoretical maximum. In this work we show that this phenomenon also occurs in networks with maxout (multi-argument) activation functions. We also show that the parameter space has a multitude of full-dimensional regions with widely different complexity, and obtain nontrivial lower bounds on the expected complexity.
arXiv Detail & Related papers (2021-07-01T11:36:32Z)
What Kinds of Functions do Deep Neural Networks Learn? Insights from Variational Spline Theory [19.216784367141972]
We develop a variational framework to understand the properties of functions learned by deep neural networks with ReLU activation functions fit to data. We derive a representer theorem showing that deep ReLU networks are solutions to regularized data fitting problems in this function space.
arXiv Detail & Related papers (2021-05-07T16:18:22Z)
A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time. We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both. Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z)
Neural network approaches to point lattice decoding [6.025026882312586]
Voronoi-reduced basis is introduced to restrict the space of solutions to a binary set. We count the number of affine pieces in the CPWL decoding function to characterize the complexity of the decoding problem.
arXiv Detail & Related papers (2020-12-13T10:53:34Z)
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning. In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function. Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z)
Measuring Model Complexity of Neural Networks with Curve Activation Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function. We experimentally explore the training process of neural networks and detect overfitting. We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.