An Optimization and Generalization Analysis for Max-Pooling Networks
- URL: http://arxiv.org/abs/2002.09781v4
- Date: Thu, 4 Mar 2021 11:45:12 GMT
- Title: An Optimization and Generalization Analysis for Max-Pooling Networks
- Authors: Alon Brutzkus, Amir Globerson
- Abstract summary: Max-Pooling operations are a core component of deep learning architectures.
We perform a theoretical analysis of a convolutional max-pooling architecture.
We empirically validate that CNNs significantly outperform fully connected networks in our setting.
- Score: 34.58092926599547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Max-Pooling operations are a core component of deep learning architectures.
In particular, they are part of most convolutional architectures used in
machine vision, since pooling is a natural approach to pattern detection
problems. However, these architectures are not well understood from a
theoretical perspective. For example, we do not understand when they can be
globally optimized, and what is the effect of over-parameterization on
generalization. Here we perform a theoretical analysis of a convolutional
max-pooling architecture, proving that it can be globally optimized, and can
generalize well even for highly over-parameterized models. Our analysis focuses
on a data generating distribution inspired by pattern detection problem, where
a "discriminative" pattern needs to be detected among "spurious" patterns. We
empirically validate that CNNs significantly outperform fully connected
networks in our setting, as predicted by our theoretical results.
Related papers
- Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process.
In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture.
We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - Deep Equilibrium Assisted Block Sparse Coding of Inter-dependent
Signals: Application to Hyperspectral Imaging [71.57324258813675]
A dataset of inter-dependent signals is defined as a matrix whose columns demonstrate strong dependencies.
A neural network is employed to act as structure prior and reveal the underlying signal interdependencies.
Deep unrolling and Deep equilibrium based algorithms are developed, forming highly interpretable and concise deep-learning-based architectures.
arXiv Detail & Related papers (2022-03-29T21:00:39Z) - Path Regularization: A Convexity and Sparsity Inducing Regularization
for Parallel ReLU Networks [75.33431791218302]
We study the training problem of deep neural networks and introduce an analytic approach to unveil hidden convexity in the optimization landscape.
We consider a deep parallel ReLU network architecture, which also includes standard deep networks and ResNets as its special cases.
arXiv Detail & Related papers (2021-10-18T18:00:36Z) - Generalization by design: Shortcuts to Generalization in Deep Learning [7.751691910877239]
We show that good generalization may be instigated by bounded spectral products over layers leading to a novel geometric regularizer.
Backed up by theory we further demonstrate that "generalization by design" is practically possible and that good generalization may be encoded into the structure of the network.
arXiv Detail & Related papers (2021-07-05T20:01:23Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Post-mortem on a deep learning contest: a Simpson's paradox and the
complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models.
We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data.
We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z) - Reframing Neural Networks: Deep Structure in Overcomplete
Representations [41.84502123663809]
We introduce deep frame approximation, a unifying framework for representation learning with structured overcomplete frames.
We quantify structural differences with the deep frame potential, a data-independent measure of coherence linked to representation uniqueness and stability.
This connection to the established theory of overcomplete representations suggests promising new directions for principled deep network architecture design.
arXiv Detail & Related papers (2021-03-10T01:15:14Z) - Disentangling Neural Architectures and Weights: A Case Study in
Supervised Classification [8.976788958300766]
This work investigates the problem of disentangling the role of the neural structure and its edge weights.
We show that well-trained architectures may not need any link-specific fine-tuning of the weights.
We use a novel and computationally efficient method that translates the hard architecture-search problem into a feasible optimization problem.
arXiv Detail & Related papers (2020-09-11T11:22:22Z) - DessiLBI: Exploring Structural Sparsity of Deep Networks via
Differential Inclusion Paths [45.947140164621096]
We propose a new approach based on differential inclusions of inverse scale spaces.
We show that DessiLBI unveils "winning tickets" in early epochs.
arXiv Detail & Related papers (2020-07-04T04:40:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.