Related papers: Geometry of Linear Convolutional Networks

Geometry of Linear Convolutional Networks

URL: http://arxiv.org/abs/2108.01538v1
Date: Tue, 3 Aug 2021 14:42:18 GMT
Title: Geometry of Linear Convolutional Networks
Authors: Kathl\'en Kohn, Thomas Merkh, Guido Mont\'ufar, Matthew Trager
Abstract summary: We study the family of functions represented by a linear convolutional neural network (LCN) We study the optimization of an objective function over an LCN, analyzing critical points in function space and in gradient space. Overall, our theory predicts that the optimized parameters of an LCN will often correspond to repeated filters across layers.
Score: 7.990816079551592
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the family of functions that are represented by a linear convolutional neural network (LCN). These functions form a semi-algebraic subset of the set of linear maps from input space to output space. In contrast, the families of functions represented by fully-connected linear networks form algebraic sets. We observe that the functions represented by LCNs can be identified with polynomials that admit certain factorizations, and we use this perspective to describe the impact of the network's architecture on the geometry of the resulting function space. We further study the optimization of an objective function over an LCN, analyzing critical points in function space and in parameter space, and describing dynamical invariants for gradient descent. Overall, our theory predicts that the optimized parameters of an LCN will often correspond to repeated filters across layers, or filters that can be decomposed as repeated filters. We also conduct numerical and symbolic experiments that illustrate our results and present an in-depth analysis of the landscape for small architectures.

Related papers

Function Forms of Simple ReLU Networks with Random Hidden Weights [1.2289361708127877]
We investigate the function space dynamics of a two-layer ReLU neural network in the infinite-width limit.<n>We highlight the Fisher information matrix's role in steering learning.<n>This work offers a robust foundation for understanding wide neural networks.
arXiv Detail & Related papers (2025-05-23T13:53:02Z)
On the Geometry and Optimization of Polynomial Convolutional Networks [2.9816332334719773]
We study convolutional neural networks with monomial activation functions. We compute the dimension and the degree of the neuromanifold, which measure the expressivity of the model. For a generic large dataset, we derive an explicit formula that quantifies the number of critical points arising in the optimization of a regression loss.
arXiv Detail & Related papers (2024-10-01T14:13:05Z)
A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks. We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks. Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z)
Deep Neural Networks and Finite Elements of Any Order on Arbitrary Dimensions [2.7195102129095003]
Deep neural networks employing ReLU and ReLU$2$ activation functions can effectively represent Lagrange finite element functions of any order on various simplicial meshes in arbitrary dimensions. Our findings present the first demonstration of how deep neural networks can systematically generate general continuous piecewise functions on both specific or arbitrary simplicial meshes.
arXiv Detail & Related papers (2023-12-21T19:57:29Z)
ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models [9.96121040675476]
This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions. Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
arXiv Detail & Related papers (2023-05-24T22:10:12Z)
Function Space and Critical Points of Linear Convolutional Networks [4.483341215742946]
We study the geometry of linear networks with one-dimensional convolutional layers. We analyze the impact of the network's architecture on the function space's dimension, boundary, and singular points.
arXiv Detail & Related papers (2023-04-12T10:15:17Z)
Optimal Approximation Complexity of High-Dimensional Functions with Neural Networks [3.222802562733787]
We investigate properties of neural networks that use both ReLU and $x2$ as activation functions. We show how to leverage low local dimensionality in some contexts to overcome the curse of dimensionality, obtaining approximation rates that are optimal for unknown lower-dimensional subspaces.
arXiv Detail & Related papers (2023-01-30T17:29:19Z)
Functional dimension of feedforward ReLU neural networks [0.0]
We show that functional dimension is inhomogeneous across the parameter space of ReLU neural network functions. We also study the quotient space and fibers of the realization map from parameter space to function space.
arXiv Detail & Related papers (2022-09-08T21:30:16Z)
Learnable Filters for Geometric Scattering Modules [64.03877398967282]
We propose a new graph neural network (GNN) module based on relaxations of recently proposed geometric scattering transforms. Our learnable geometric scattering (LEGS) module enables adaptive tuning of the wavelets to encourage band-pass features to emerge in learned representations.
arXiv Detail & Related papers (2022-08-15T22:30:07Z)
Exploring Linear Feature Disentanglement For Neural Networks [63.20827189693117]
Non-linear activation functions, e.g., Sigmoid, ReLU, and Tanh, have achieved great success in neural networks (NNs) Due to the complex non-linear characteristic of samples, the objective of those activation functions is to project samples from their original feature space to a linear separable feature space. This phenomenon ignites our interest in exploring whether all features need to be transformed by all non-linear functions in current typical NNs.
arXiv Detail & Related papers (2022-03-22T13:09:17Z)
Convolutional Filtering and Neural Networks with Non Commutative Algebras [153.20329791008095]
We study the generalization of non commutative convolutional neural networks. We show that non commutative convolutional architectures can be stable to deformations on the space of operators.
arXiv Detail & Related papers (2021-08-23T04:22:58Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
Convex Geometry and Duality of Over-parameterized Neural Networks [70.15611146583068]
We develop a convex analytic approach to analyze finite width two-layer ReLU networks. We show that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set. In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints.
arXiv Detail & Related papers (2020-02-25T23:05:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.