Related papers: A Note on the Implicit Bias Towards Minimal Depth of Deep Neural Networks

A Note on the Implicit Bias Towards Minimal Depth of Deep Neural Networks

URL: http://arxiv.org/abs/2202.09028v1
Date: Fri, 18 Feb 2022 05:21:28 GMT
Title: A Note on the Implicit Bias Towards Minimal Depth of Deep Neural Networks
Authors: Tomer Galanti
Abstract summary: A central aspect that enables the success of these systems is the ability to train deep models instead of wide shallow ones. While training deep neural networks repetitively achieves superior performance against their shallow counterparts, an understanding of the role of depth in representation learning is still lacking.
Score: 11.739219085726006
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning systems have steadily advanced the state of the art in a wide variety of benchmarks, demonstrating impressive performance in tasks ranging from image classification \citep{taigman2014deepface,zhai2021scaling}, language processing \citep{devlin-etal-2019-bert,NEURIPS2020_1457c0d6}, open-ended environments \citep{SilverHuangEtAl16nature,arulkumaran2019alphastar}, to coding \citep{chen2021evaluating}. A central aspect that enables the success of these systems is the ability to train deep models instead of wide shallow ones \citep{7780459}. Intuitively, a neural network is decomposed into hierarchical representations from raw data to high-level, more abstract features. While training deep neural networks repetitively achieves superior performance against their shallow counterparts, an understanding of the role of depth in representation learning is still lacking. In this work, we suggest a new perspective on understanding the role of depth in deep learning. We hypothesize that {\bf\em SGD training of overparameterized neural networks exhibits an implicit bias that favors solutions of minimal effective depth}. Namely, SGD trains neural networks for which the top several layers are redundant. To evaluate the redundancy of layers, we revisit the recently discovered phenomenon of neural collapse \citep{Papyan24652,han2021neural}.

Related papers

An Analysis Framework for Understanding Deep Neural Networks Based on Network Dynamics [11.44947569206928]
Deep neural networks (DNNs) maximize information extraction by rationally allocating the proportion of neurons in different modes across deep layers. This framework provides a unified explanation for fundamental DNN behaviors such as the "flat minima effect," "grokking," and double descent phenomena.
arXiv Detail & Related papers (2025-01-05T04:23:21Z)
Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks. We show that the networks acquire strong, data-dependent features. Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z)
Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning. Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z)
Exact Solutions of a Deep Linear Network [2.2344764434954256]
This work finds the analytical expression of the global minima of a deep linear network with weight decay and neurons. We show that weight decay strongly interacts with the model architecture and can create bad minima at zero in a network with more than $1$ hidden layer.
arXiv Detail & Related papers (2022-02-10T00:13:34Z)
Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules. inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z)
Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks. We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z)
A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation. Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z)
A Layer-Wise Information Reinforcement Approach to Improve Learning in Deep Belief Networks [0.4893345190925178]
This paper proposes the Residual Deep Belief Network, which considers the information reinforcement layer-by-layer to improve the feature extraction and knowledge retaining. Experiments conducted over three public datasets demonstrate its robustness concerning the task of binary image classification.
arXiv Detail & Related papers (2021-01-17T18:53:18Z)
Theoretical Analysis of the Advantage of Deepening Neural Networks [0.0]
It is important to know the expressivity of functions computable by deep neural networks. By the two criteria, we show that to increase layers is more effective than to increase units at each layer on improving the expressivity of deep neural networks.
arXiv Detail & Related papers (2020-09-24T04:10:50Z)
Locality Guided Neural Networks for Explainable Artificial Intelligence [12.435539489388708]
We propose a novel algorithm for back propagation, called Locality Guided Neural Network(LGNN) LGNN preserves locality between neighbouring neurons within each layer of a deep network. In our experiments, we train various VGG and Wide ResNet (WRN) networks for image classification on CIFAR100.
arXiv Detail & Related papers (2020-07-12T23:45:51Z)
Towards Understanding Hierarchical Learning: Benefits of Neural Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks. We show that neural representation can achieve improved sample complexities compared with the raw input. Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z)
An Overview of Neural Network Compression [2.550900579709111]
In recent years there has been a resurgence in model compression techniques, particularly for deep convolutional neural networks and self-attention based networks such as the Transformer. This paper provides a timely overview of both old and current compression techniques for deep neural networks, including pruning, quantization, tensor decomposition, knowledge distillation and combinations thereof.
arXiv Detail & Related papers (2020-06-05T20:28:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.