Theoretical Analysis of Inductive Biases in Deep Convolutional Networks
- URL: http://arxiv.org/abs/2305.08404v2
- Date: Sat, 20 Jan 2024 15:50:57 GMT
- Title: Theoretical Analysis of Inductive Biases in Deep Convolutional Networks
- Authors: Zihao Wang, Lei Wu
- Abstract summary: We provide a theoretical analysis of the inductive biases in convolutional neural networks (CNNs)
We compare the performance of CNNs, locally-connected networks (LCNs), and fully-connected networks (FCNs) on a simple regression task.
We prove that LCNs require $Omega(d)$ samples while CNNs need only $widetildemathcalO(log2d)$ samples, highlighting the critical role of weight sharing.
- Score: 16.41952363194339
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we provide a theoretical analysis of the inductive biases in
convolutional neural networks (CNNs). We start by examining the universality of
CNNs, i.e., the ability to approximate any continuous functions. We prove that
a depth of $\mathcal{O}(\log d)$ suffices for deep CNNs to achieve this
universality, where $d$ in the input dimension. Additionally, we establish that
learning sparse functions with CNNs requires only
$\widetilde{\mathcal{O}}(\log^2d)$ samples, indicating that deep CNNs can
efficiently capture {\em long-range} sparse correlations. These results are
made possible through a novel combination of the multichanneling and
downsampling when increasing the network depth. We also delve into the distinct
roles of weight sharing and locality in CNNs. To this end, we compare the
performance of CNNs, locally-connected networks (LCNs), and fully-connected
networks (FCNs) on a simple regression task, where LCNs can be viewed as CNNs
without weight sharing. On the one hand, we prove that LCNs require
${\Omega}(d)$ samples while CNNs need only $\widetilde{\mathcal{O}}(\log^2d)$
samples, highlighting the critical role of weight sharing. On the other hand,
we prove that FCNs require $\Omega(d^2)$ samples, whereas LCNs need only
$\widetilde{\mathcal{O}}(d)$ samples, underscoring the importance of locality.
These provable separations quantify the difference between the two biases, and
the major observation behind our proof is that weight sharing and locality
break different symmetries in the learning process.
Related papers
- Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
We show at a physics level of rigor that Bayesian inference with a fully connected neural network is solvable.
We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature.
arXiv Detail & Related papers (2024-05-26T17:08:04Z) - CNN2GNN: How to Bridge CNN with GNN [59.42117676779735]
We propose a novel CNN2GNN framework to unify CNN and GNN together via distillation.
The performance of distilled boosted'' two-layer GNN on Mini-ImageNet is much higher than CNN containing dozens of layers such as ResNet152.
arXiv Detail & Related papers (2024-04-23T08:19:08Z) - On the rates of convergence for learning with convolutional neural networks [9.772773527230134]
We study approximation and learning capacities of convolutional neural networks (CNNs) with one-side zero-padding and multiple channels.
We derive convergence rates for estimators based on CNNs in many learning problems.
It is also shown that the obtained rates for classification are minimax optimal in some common settings.
arXiv Detail & Related papers (2024-03-25T06:42:02Z) - Role of Locality and Weight Sharing in Image-Based Tasks: A Sample Complexity Separation between CNNs, LCNs, and FCNs [42.551773746803946]
Vision tasks are characterized by the properties of locality and translation invariance.
The superior performance of convolutional neural networks (CNNs) on these tasks is widely attributed to the inductive bias of locality and weight sharing baked into their architecture.
Existing attempts to quantify the statistical benefits of these biases in CNNs over locally connected neural networks (LCNs) and fully connected neural networks (FCNs) fall into one of the following categories.
arXiv Detail & Related papers (2024-03-23T03:57:28Z) - The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich
Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$.
We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z) - Distributed Sparse Feature Selection in Communication-Restricted
Networks [6.9257380648471765]
We propose and theoretically analyze a new distributed scheme for sparse linear regression and feature selection.
In order to infer the causal dimensions from the whole dataset, we propose a simple, yet effective method for information sharing in the network.
arXiv Detail & Related papers (2021-11-02T05:02:24Z) - BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by
Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks.
Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z) - Approximating smooth functions by deep neural networks with sigmoid
activation function [0.0]
We study the power of deep neural networks (DNNs) with sigmoid activation function.
We show that DNNs with fixed depth and a width of order $Md$ achieve an approximation rate of $M-2p$.
arXiv Detail & Related papers (2020-10-08T07:29:31Z) - Approximation and Non-parametric Estimation of ResNet-type Convolutional
Neural Networks [52.972605601174955]
We show a ResNet-type CNN can attain the minimax optimal error rates in important function classes.
We derive approximation and estimation error rates of the aformentioned type of CNNs for the Barron and H"older classes.
arXiv Detail & Related papers (2019-03-24T19:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.