Make Deep Networks Shallow Again
- URL: http://arxiv.org/abs/2309.08414v1
- Date: Fri, 15 Sep 2023 14:18:21 GMT
- Title: Make Deep Networks Shallow Again
- Authors: Bernhard Bermeitinger, Tomas Hrycej, Siegfried Handschuh
- Abstract summary: A breakthrough has been achieved by the concept of residual connections.
A stack of residual connection layers can be expressed as an expansion of terms similar to the Taylor expansion.
In other words, a sequential deep architecture is substituted by a parallel shallow one.
- Score: 6.647569337929869
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Deep neural networks have a good success record and are thus viewed as the
best architecture choice for complex applications. Their main shortcoming has
been, for a long time, the vanishing gradient which prevented the numerical
optimization algorithms from acceptable convergence. A breakthrough has been
achieved by the concept of residual connections -- an identity mapping parallel
to a conventional layer. This concept is applicable to stacks of layers of the
same dimension and substantially alleviates the vanishing gradient problem. A
stack of residual connection layers can be expressed as an expansion of terms
similar to the Taylor expansion. This expansion suggests the possibility of
truncating the higher-order terms and receiving an architecture consisting of a
single broad layer composed of all initially stacked layers in parallel. In
other words, a sequential deep architecture is substituted by a parallel
shallow one. Prompted by this theory, we investigated the performance
capabilities of the parallel architecture in comparison to the sequential one.
The computer vision datasets MNIST and CIFAR10 were used to train both
architectures for a total of 6912 combinations of varying numbers of
convolutional layers, numbers of filters, kernel sizes, and other meta
parameters. Our findings demonstrate a surprising equivalence between the deep
(sequential) and shallow (parallel) architectures. Both layouts produced
similar results in terms of training and validation set loss. This discovery
implies that a wide, shallow architecture can potentially replace a deep
network without sacrificing performance. Such substitution has the potential to
simplify network architectures, improve optimization efficiency, and accelerate
the training process.
Related papers
- Automatic Gradient Descent: Deep Learning without Hyperparameters [35.350274248478804]
The architecture of a deep neural network is defined explicitly in terms of the number of layers, the width of each layer and the general network topology.
Paper builds a new framework for deriving objective functions: gradient idea is to transform a Bregman divergence to account for the non gradient structure of neural architecture.
arXiv Detail & Related papers (2023-04-11T12:45:52Z) - Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter.
We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures''
Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z) - Pruning-as-Search: Efficient Neural Architecture Search via Channel
Pruning and Structural Reparameterization [50.50023451369742]
Pruning-as-Search (PaS) is an end-to-end channel pruning method to search out desired sub-network automatically and efficiently.
Our proposed architecture outperforms prior arts by around $1.0%$ top-1 accuracy on ImageNet-1000 classification task.
arXiv Detail & Related papers (2022-06-02T17:58:54Z) - Unified Field Theory for Deep and Recurrent Neural Networks [56.735884560668985]
We present a unified and systematic derivation of the mean-field theory for both recurrent and deep networks.
We find that convergence towards the mean-field theory is typically slower for recurrent networks than for deep networks.
Our method exposes that Gaussian processes are but the lowest order of a systematic expansion in $1/n$.
arXiv Detail & Related papers (2021-12-10T15:06:11Z) - SIRe-Networks: Skip Connections over Interlaced Multi-Task Learning and
Residual Connections for Structure Preserving Object Classification [28.02302915971059]
In this paper, we introduce an interlaced multi-task learning strategy, defined SIRe, to reduce the vanishing gradient in relation to the object classification task.
The presented methodology directly improves a convolutional neural network (CNN) by enforcing the input image structure preservation through auto-encoders.
To validate the presented methodology, a simple CNN and various implementations of famous networks are extended via the SIRe strategy and extensively tested on the CIFAR100 dataset.
arXiv Detail & Related papers (2021-10-06T13:54:49Z) - Differentiable Architecture Pruning for Transfer Learning [6.935731409563879]
We propose a gradient-based approach for extracting sub-architectures from a given large model.
Our architecture-pruning scheme produces transferable new structures that can be successfully retrained to solve different tasks.
We provide theoretical convergence guarantees and validate the proposed transfer-learning strategy on real data.
arXiv Detail & Related papers (2021-07-07T17:44:59Z) - RAN-GNNs: breaking the capacity limits of graph neural networks [43.66682619000099]
Graph neural networks have become a staple in problems addressing learning and analysis of data defined over graphs.
Recent works attribute this to the need to consider multiple neighborhood sizes at the same time and adaptively tune them.
We show that employing a randomly-wired architecture can be a more effective way to increase the capacity of the network and obtain richer representations.
arXiv Detail & Related papers (2021-03-29T12:34:36Z) - Reframing Neural Networks: Deep Structure in Overcomplete
Representations [41.84502123663809]
We introduce deep frame approximation, a unifying framework for representation learning with structured overcomplete frames.
We quantify structural differences with the deep frame potential, a data-independent measure of coherence linked to representation uniqueness and stability.
This connection to the established theory of overcomplete representations suggests promising new directions for principled deep network architecture design.
arXiv Detail & Related papers (2021-03-10T01:15:14Z) - Dual-constrained Deep Semi-Supervised Coupled Factorization Network with
Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net.
To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network.
Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z) - Understanding Deep Architectures with Reasoning Layer [60.90906477693774]
We show that properties of the algorithm layers, such as convergence, stability, and sensitivity, are intimately related to the approximation and generalization abilities of the end-to-end model.
Our theory can provide useful guidelines for designing deep architectures with reasoning layers.
arXiv Detail & Related papers (2020-06-24T00:26:35Z) - Stage-Wise Neural Architecture Search [65.03109178056937]
Modern convolutional networks such as ResNet and NASNet have achieved state-of-the-art results in many computer vision applications.
These networks consist of stages, which are sets of layers that operate on representations in the same resolution.
It has been demonstrated that increasing the number of layers in each stage improves the prediction ability of the network.
However, the resulting architecture becomes computationally expensive in terms of floating point operations, memory requirements and inference time.
arXiv Detail & Related papers (2020-04-23T14:16:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.