Inductive Bias of Multi-Channel Linear Convolutional Networks with
Bounded Weight Norm
- URL: http://arxiv.org/abs/2102.12238v1
- Date: Wed, 24 Feb 2021 12:01:23 GMT
- Title: Inductive Bias of Multi-Channel Linear Convolutional Networks with
Bounded Weight Norm
- Authors: Meena Jagadeesan, Ilya Razenshteyn, Suriya Gunasekar
- Abstract summary: We study the function space characterization of the inductive bias resulting from controlling the $ell$ norm of the weights in linear convolutional networks.
For sufficiently large $C$, the induced regularizer for $K=1$ and $K=D$ are the nuclear norm and the $ell_2,1$ group-sparse norm, respectively.
- Score: 15.08164172607321
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the function space characterization of the inductive bias resulting
from controlling the $\ell_2$ norm of the weights in linear convolutional
networks. We view this in terms of an induced regularizer in the function space
given by the minimum norm of weights required to realize a linear function. For
two layer linear convolutional networks with $C$ output channels and kernel
size $K$, we show the following: (a) If the inputs to the network have a single
channel, the induced regularizer for any $K$ is a norm given by a semidefinite
program (SDP) that is independent of the number of output channels $C$. We
further validate these results through a binary classification task on MNIST.
(b) In contrast, for networks with multi-channel inputs, multiple output
channels can be necessary to merely realize all matrix-valued linear functions
and thus the inductive bias does depend on $C$. Further, for sufficiently large
$C$, the induced regularizer for $K=1$ and $K=D$ are the nuclear norm and the
$\ell_{2,1}$ group-sparse norm, respectively, of the Fourier coefficients --
both of which promote sparse structures.
Related papers
- Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
We show at a physics level of rigor that Bayesian inference with a fully connected neural network is solvable.
We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature.
arXiv Detail & Related papers (2024-05-26T17:08:04Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Deterministic identification over channels with finite output: a
dimensional perspective on superlinear rates [53.66705737169404]
We consider the problem in its generality for memoryless channels with finite output, but arbitrary input alphabets.
Our main findings are that the maximum number of messages thus identifiable scales super-exponentially as $2R,nlog n$ with the block length $n$.
Results are shown to generalise directly to classical-quantum channels with finite-dimensional output quantum system.
arXiv Detail & Related papers (2024-02-14T11:59:30Z) - Intrinsic dimensionality and generalization properties of the
$\mathcal{R}$-norm inductive bias [4.37441734515066]
The $mathcalR$-norm is the basis of an inductive bias for two-layer neural networks.
We find that these interpolants are intrinsically multivariate functions, even when there are ridge functions that fit the data.
arXiv Detail & Related papers (2022-06-10T18:33:15Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Deformed semicircle law and concentration of nonlinear random matrices
for ultra-wide neural networks [29.03095282348978]
We study the limiting spectral distributions of two empirical kernel matrices associated with $f(X)$.
We show that random feature regression induced by the empirical kernel achieves the same performance as its limiting kernel regression under the ultra-wide regime.
arXiv Detail & Related papers (2021-09-20T05:25:52Z) - Deep neural network approximation of analytic functions [91.3755431537592]
entropy bound for the spaces of neural networks with piecewise linear activation functions.
We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
arXiv Detail & Related papers (2021-04-05T18:02:04Z) - A Unifying View on Implicit Bias in Training Linear Neural Networks [31.65006970108761]
We study the implicit bias of gradient flow (i.e., gradient descent with infinitesimal step size) on linear neural network training.
We propose a tensor formulation of neural networks that includes fully-connected, diagonal, and convolutional networks as special cases.
arXiv Detail & Related papers (2020-10-06T06:08:35Z) - Neural Networks are Convex Regularizers: Exact Polynomial-time Convex
Optimization Formulations for Two-layer Networks [70.15611146583068]
We develop exact representations of training two-layer neural networks with rectified linear units (ReLUs)
Our theory utilizes semi-infinite duality and minimum norm regularization.
arXiv Detail & Related papers (2020-02-24T21:32:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.