Benefits of Overparameterized Convolutional Residual Networks: Function
Approximation under Smoothness Constraint
- URL: http://arxiv.org/abs/2206.04569v1
- Date: Thu, 9 Jun 2022 15:35:22 GMT
- Title: Benefits of Overparameterized Convolutional Residual Networks: Function
Approximation under Smoothness Constraint
- Authors: Hao Liu, Minshuo Chen, Siawpeng Er, Wenjing Liao, Tong Zhang, Tuo Zhao
- Abstract summary: We prove that large ConvResNets can not only approximate a target function in terms of function value, but also exhibit sufficient first-order smoothness.
Our theory partially justifies the benefits of using deep and wide networks in practice.
- Score: 48.25573695787407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Overparameterized neural networks enjoy great representation power on complex
data, and more importantly yield sufficiently smooth output, which is crucial
to their generalization and robustness. Most existing function approximation
theories suggest that with sufficiently many parameters, neural networks can
well approximate certain classes of functions in terms of the function value.
The neural network themselves, however, can be highly nonsmooth. To bridge this
gap, we take convolutional residual networks (ConvResNets) as an example, and
prove that large ConvResNets can not only approximate a target function in
terms of function value, but also exhibit sufficient first-order smoothness.
Moreover, we extend our theory to approximating functions supported on a
low-dimensional manifold. Our theory partially justifies the benefits of using
deep and wide networks in practice. Numerical experiments on adversarial robust
image classification are provided to support our theory.
Related papers
- Nonparametric Classification on Low Dimensional Manifolds using
Overparameterized Convolutional Residual Networks [82.03459331544737]
We study the performance of ConvResNeXts, trained with weight decay from the perspective of nonparametric classification.
Our analysis allows for infinitely many building blocks in ConvResNeXts, and shows that weight decay implicitly enforces sparsity on these blocks.
arXiv Detail & Related papers (2023-07-04T11:08:03Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - On the Approximation and Complexity of Deep Neural Networks to Invariant
Functions [0.0]
We study the approximation and complexity of deep neural networks to invariant functions.
We show that a broad range of invariant functions can be approximated by various types of neural network models.
We provide a feasible application that connects the parameter estimation and forecasting of high-resolution signals with our theoretical conclusions.
arXiv Detail & Related papers (2022-10-27T09:19:19Z) - A Theoretical View on Sparsely Activated Networks [21.156069843782017]
We present a formal model of data-dependent sparse networks that captures salient aspects of popular architectures.
We then introduce a routing function based on locality sensitive hashing (LSH) that enables us to reason about how well sparse networks approximate target functions.
We prove that sparse networks can match the approximation power of dense networks on Lipschitz functions.
arXiv Detail & Related papers (2022-08-08T23:14:48Z) - Optimal Learning Rates of Deep Convolutional Neural Networks: Additive
Ridge Functions [19.762318115851617]
We consider the mean squared error analysis for deep convolutional neural networks.
We show that, for additive ridge functions, convolutional neural networks followed by one fully connected layer with ReLU activation functions can reach optimal mini-max rates.
arXiv Detail & Related papers (2022-02-24T14:22:32Z) - Optimal Approximation with Sparse Neural Networks and Applications [0.0]
We use deep sparsely connected neural networks to measure the complexity of a function class in $L(mathbb Rd)$.
We also introduce representation system - a countable collection of functions to guide neural networks.
We then analyse the complexity of a class called $beta$ cartoon-like functions using rate-distortion theory and wedgelets construction.
arXiv Detail & Related papers (2021-08-14T05:14:13Z) - The Connection Between Approximation, Depth Separation and Learnability
in Neural Networks [70.55686685872008]
We study the connection between learnability and approximation capacity.
We show that learnability with deep networks of a target function depends on the ability of simpler classes to approximate the target.
arXiv Detail & Related papers (2021-01-31T11:32:30Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - Approximation smooth and sparse functions by deep neural networks
without saturation [0.6396288020763143]
In this paper, we aim at constructing deep neural networks with three hidden layers to approximate smooth and sparse functions.
We prove that the constructed deep nets can reach the optimal approximation rate in approximating both smooth and sparse functions with controllable magnitude of free parameters.
arXiv Detail & Related papers (2020-01-13T09:28:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.