Geometry of Linear Convolutional Networks
- URL: http://arxiv.org/abs/2108.01538v1
- Date: Tue, 3 Aug 2021 14:42:18 GMT
- Title: Geometry of Linear Convolutional Networks
- Authors: Kathl\'en Kohn, Thomas Merkh, Guido Mont\'ufar, Matthew Trager
- Abstract summary: We study the family of functions represented by a linear convolutional neural network (LCN)
We study the optimization of an objective function over an LCN, analyzing critical points in function space and in gradient space.
Overall, our theory predicts that the optimized parameters of an LCN will often correspond to repeated filters across layers.
- Score: 7.990816079551592
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the family of functions that are represented by a linear
convolutional neural network (LCN). These functions form a semi-algebraic
subset of the set of linear maps from input space to output space. In contrast,
the families of functions represented by fully-connected linear networks form
algebraic sets. We observe that the functions represented by LCNs can be
identified with polynomials that admit certain factorizations, and we use this
perspective to describe the impact of the network's architecture on the
geometry of the resulting function space. We further study the optimization of
an objective function over an LCN, analyzing critical points in function space
and in parameter space, and describing dynamical invariants for gradient
descent. Overall, our theory predicts that the optimized parameters of an LCN
will often correspond to repeated filters across layers, or filters that can be
decomposed as repeated filters. We also conduct numerical and symbolic
experiments that illustrate our results and present an in-depth analysis of the
landscape for small architectures.
Related papers
- On the Geometry and Optimization of Polynomial Convolutional Networks [2.9816332334719773]
We study convolutional neural networks with monomial activation functions.
We compute the dimension and the degree of the neuromanifold, which measure the expressivity of the model.
For a generic large dataset, we derive an explicit formula that quantifies the number of critical points arising in the optimization of a regression loss.
arXiv Detail & Related papers (2024-10-01T14:13:05Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models [9.96121040675476]
This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions.
Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
arXiv Detail & Related papers (2023-05-24T22:10:12Z) - Function Space and Critical Points of Linear Convolutional Networks [4.483341215742946]
We study the geometry of linear networks with one-dimensional convolutional layers.
We analyze the impact of the network's architecture on the function space's dimension, boundary, and singular points.
arXiv Detail & Related papers (2023-04-12T10:15:17Z) - Optimal Approximation Complexity of High-Dimensional Functions with
Neural Networks [3.222802562733787]
We investigate properties of neural networks that use both ReLU and $x2$ as activation functions.
We show how to leverage low local dimensionality in some contexts to overcome the curse of dimensionality, obtaining approximation rates that are optimal for unknown lower-dimensional subspaces.
arXiv Detail & Related papers (2023-01-30T17:29:19Z) - Functional dimension of feedforward ReLU neural networks [0.0]
We show that functional dimension is inhomogeneous across the parameter space of ReLU neural network functions.
We also study the quotient space and fibers of the realization map from parameter space to function space.
arXiv Detail & Related papers (2022-09-08T21:30:16Z) - Learnable Filters for Geometric Scattering Modules [64.03877398967282]
We propose a new graph neural network (GNN) module based on relaxations of recently proposed geometric scattering transforms.
Our learnable geometric scattering (LEGS) module enables adaptive tuning of the wavelets to encourage band-pass features to emerge in learned representations.
arXiv Detail & Related papers (2022-08-15T22:30:07Z) - Exploring Linear Feature Disentanglement For Neural Networks [63.20827189693117]
Non-linear activation functions, e.g., Sigmoid, ReLU, and Tanh, have achieved great success in neural networks (NNs)
Due to the complex non-linear characteristic of samples, the objective of those activation functions is to project samples from their original feature space to a linear separable feature space.
This phenomenon ignites our interest in exploring whether all features need to be transformed by all non-linear functions in current typical NNs.
arXiv Detail & Related papers (2022-03-22T13:09:17Z) - Convolutional Filtering and Neural Networks with Non Commutative
Algebras [153.20329791008095]
We study the generalization of non commutative convolutional neural networks.
We show that non commutative convolutional architectures can be stable to deformations on the space of operators.
arXiv Detail & Related papers (2021-08-23T04:22:58Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Convex Geometry and Duality of Over-parameterized Neural Networks [70.15611146583068]
We develop a convex analytic approach to analyze finite width two-layer ReLU networks.
We show that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set.
In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints.
arXiv Detail & Related papers (2020-02-25T23:05:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.