Why Layer-Wise Learning is Hard to Scale-up and a Possible Solution via
Accelerated Downsampling
- URL: http://arxiv.org/abs/2010.08038v1
- Date: Thu, 15 Oct 2020 21:51:43 GMT
- Title: Why Layer-Wise Learning is Hard to Scale-up and a Possible Solution via
Accelerated Downsampling
- Authors: Wenchi Ma, Miao Yu, Kaidong Li, Guanghui Wang
- Abstract summary: Layer-wise learning can achieve state-of-the-art performance in image classification on various datasets.
Previous studies of layer-wise learning are limited to networks with simple hierarchical structures.
This paper reveals the fundamental reason that impedes the scale-up of layer-wise learning is due to the relatively poor separability of the feature space in shallow layers.
- Score: 19.025707054206457
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Layer-wise learning, as an alternative to global back-propagation, is easy to
interpret, analyze, and it is memory efficient. Recent studies demonstrate that
layer-wise learning can achieve state-of-the-art performance in image
classification on various datasets. However, previous studies of layer-wise
learning are limited to networks with simple hierarchical structures, and the
performance decreases severely for deeper networks like ResNet. This paper, for
the first time, reveals the fundamental reason that impedes the scale-up of
layer-wise learning is due to the relatively poor separability of the feature
space in shallow layers. This argument is empirically verified by controlling
the intensity of the convolution operation in local layers. We discover that
the poorly-separable features from shallow layers are mismatched with the
strong supervision constraint throughout the entire network, making the
layer-wise learning sensitive to network depth. The paper further proposes a
downsampling acceleration approach to weaken the poor learning of shallow
layers so as to transfer the learning emphasis to deep feature space where the
separability matches better with the supervision restraint. Extensive
experiments have been conducted to verify the new finding and demonstrate the
advantages of the proposed downsampling acceleration in improving the
performance of layer-wise learning.
Related papers
- Understanding Deep Representation Learning via Layerwise Feature
Compression and Discrimination [33.273226655730326]
We show that each layer of a deep linear network progressively compresses within-class features at a geometric rate and discriminates between-class features at a linear rate.
This is the first quantitative characterization of feature evolution in hierarchical representations of deep linear networks.
arXiv Detail & Related papers (2023-11-06T09:00:38Z) - Understanding Deep Neural Networks via Linear Separability of Hidden
Layers [68.23950220548417]
We first propose Minkowski difference based linear separability measures (MD-LSMs) to evaluate the linear separability degree of two points sets.
We demonstrate that there is a synchronicity between the linear separability degree of hidden layer outputs and the network training performance.
arXiv Detail & Related papers (2023-07-26T05:29:29Z) - Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural
Networks [49.808194368781095]
We show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks.
This work makes progress towards understanding the provable benefit of three-layer neural networks over two-layer networks in the feature learning regime.
arXiv Detail & Related papers (2023-05-11T17:19:30Z) - Contrastive Deep Supervision [23.93993488930552]
This paper proposes Contrastive Deep Supervision, which supervises the intermediate layers with augmentation-based contrastive learning.
Experimental results on nine popular datasets with eleven models demonstrate its effects on general image classification, fine-grained image classification and object detection.
arXiv Detail & Related papers (2022-07-12T04:33:42Z) - Minimizing Control for Credit Assignment with Strong Feedback [65.59995261310529]
Current methods for gradient-based credit assignment in deep neural networks need infinitesimally small feedback signals.
We combine strong feedback influences on neural activity with gradient-based learning and show that this naturally leads to a novel view on neural network optimization.
We show that the use of strong feedback in DFC allows learning forward and feedback connections simultaneously, using a learning rule fully local in space and time.
arXiv Detail & Related papers (2022-04-14T22:06:21Z) - Cascaded Compressed Sensing Networks: A Reversible Architecture for
Layerwise Learning [11.721183551822097]
We show that target propagation could be achieved by modeling the network s each layer with compressed sensing, without the need of auxiliary networks.
Experiments show that the proposed method could achieve better performance than the auxiliary network-based method.
arXiv Detail & Related papers (2021-10-20T05:21:13Z) - A Layer-Wise Information Reinforcement Approach to Improve Learning in
Deep Belief Networks [0.4893345190925178]
This paper proposes the Residual Deep Belief Network, which considers the information reinforcement layer-by-layer to improve the feature extraction and knowledge retaining.
Experiments conducted over three public datasets demonstrate its robustness concerning the task of binary image classification.
arXiv Detail & Related papers (2021-01-17T18:53:18Z) - Dual-constrained Deep Semi-Supervised Coupled Factorization Network with
Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net.
To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network.
Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z) - LoCo: Local Contrastive Representation Learning [93.98029899866866]
We show that by overlapping local blocks stacking on top of each other, we effectively increase the decoder depth and allow upper blocks to implicitly send feedbacks to lower blocks.
This simple design closes the performance gap between local learning and end-to-end contrastive learning algorithms for the first time.
arXiv Detail & Related papers (2020-08-04T05:41:29Z) - Understanding and Diagnosing Vulnerability under Adversarial Attacks [62.661498155101654]
Deep Neural Networks (DNNs) are known to be vulnerable to adversarial attacks.
We propose a novel interpretability method, InterpretGAN, to generate explanations for features used for classification in latent variables.
We also design the first diagnostic method to quantify the vulnerability contributed by each layer.
arXiv Detail & Related papers (2020-07-17T01:56:28Z) - Introducing Fuzzy Layers for Deep Learning [5.209583609264815]
We introduce a new layer to deep learning: the fuzzy layer.
Traditionally, the network architecture of neural networks is composed of an input layer, some combination of hidden layers, and an output layer.
We propose the introduction of fuzzy layers into the deep learning architecture to exploit the powerful aggregation properties expressed through fuzzy methodologies.
arXiv Detail & Related papers (2020-02-21T19:33:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.