Is deeper better? It depends on locality of relevant features
- URL: http://arxiv.org/abs/2005.12488v2
- Date: Wed, 27 Jan 2021 12:22:50 GMT
- Title: Is deeper better? It depends on locality of relevant features
- Authors: Takashi Mori, Masahito Ueda
- Abstract summary: We investigate the effect of increasing the depth within an over parameterized regime.
Experiments show that deeper is better for local labels, whereas shallower is better for global labels.
It is shown that the neural kernel does not correctly capture the depth dependence of the generalization performance.
- Score: 5.33024001730262
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It has been recognized that a heavily overparameterized artificial neural
network exhibits surprisingly good generalization performance in various
machine-learning tasks. Recent theoretical studies have made attempts to unveil
the mystery of the overparameterization. In most of those previous works, the
overparameterization is achieved by increasing the width of the network, while
the effect of increasing the depth has remained less well understood. In this
work, we investigate the effect of increasing the depth within an
overparameterized regime. To gain an insight into the advantage of depth, we
introduce local and global labels as abstract but simple classification rules.
It turns out that the locality of the relevant feature for a given
classification rule plays a key role; our experimental results suggest that
deeper is better for local labels, whereas shallower is better for global
labels. We also compare the results of finite networks with those of the neural
tangent kernel (NTK), which is equivalent to an infinitely wide network with a
proper initialization and an infinitesimal learning rate. It is shown that the
NTK does not correctly capture the depth dependence of the generalization
performance, which indicates the importance of the feature learning rather than
the lazy learning.
Related papers
- Why do Learning Rates Transfer? Reconciling Optimization and Scaling
Limits for Deep Learning [77.82908213345864]
We find empirical evidence that learning rate transfer can be attributed to the fact that under $mu$P and its depth extension, the largest eigenvalue of the training loss Hessian is largely independent of the width and depth of the network.
We show that under the neural tangent kernel (NTK) regime, the sharpness exhibits very different dynamics at different scales, thus preventing learning rate transfer.
arXiv Detail & Related papers (2024-02-27T12:28:01Z) - SAR Despeckling Using Overcomplete Convolutional Networks [53.99620005035804]
despeckling is an important problem in remote sensing as speckle degrades SAR images.
Recent studies show that convolutional neural networks(CNNs) outperform classical despeckling methods.
This study employs an overcomplete CNN architecture to focus on learning low-level features by restricting the receptive field.
We show that the proposed network improves despeckling performance compared to recent despeckling methods on synthetic and real SAR images.
arXiv Detail & Related papers (2022-05-31T15:55:37Z) - Wide and Deep Neural Networks Achieve Optimality for Classification [23.738242876364865]
We identify and construct an explicit set of neural network classifiers that achieve optimality.
In particular, we provide explicit activation functions that can be used to construct networks that achieve optimality.
Our results highlight the benefit of using deep networks for classification tasks, in contrast to regression tasks, where excessive depth is harmful.
arXiv Detail & Related papers (2022-04-29T14:27:42Z) - Interplay between depth of neural networks and locality of target
functions [5.33024001730262]
We report a remarkable interplay between depth and locality of a target function.
We find that depth is beneficial for learning local functions but detrimental to learning global functions.
arXiv Detail & Related papers (2022-01-28T12:41:24Z) - Learning Structures for Deep Neural Networks [99.8331363309895]
We propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience.
We show that sparse coding can effectively maximize the entropy of the output signals.
Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure.
arXiv Detail & Related papers (2021-05-27T12:27:24Z) - The Connection Between Approximation, Depth Separation and Learnability
in Neural Networks [70.55686685872008]
We study the connection between learnability and approximation capacity.
We show that learnability with deep networks of a target function depends on the ability of simpler classes to approximate the target.
arXiv Detail & Related papers (2021-01-31T11:32:30Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Depth Selection for Deep ReLU Nets in Feature Extraction and
Generalization [22.696129751033983]
We show that implementing the classical empirical risk minimization on deep nets can achieve the optimal generalization performance for numerous learning tasks.
Our results are verified by a series of numerical experiments including toy simulations and a real application of earthquake seismic intensity prediction.
arXiv Detail & Related papers (2020-04-01T06:03:01Z) - Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights.
Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.