Efficient shallow learning as an alternative to deep learning
- URL: http://arxiv.org/abs/2211.11106v2
- Date: Wed, 23 Nov 2022 11:38:21 GMT
- Title: Efficient shallow learning as an alternative to deep learning
- Authors: Yuval Meir, Ofek Tevet, Yarden Tzach, Shiri Hodassman, Ronit D. Gross
and Ido Kanter
- Abstract summary: We show that the error rates of the generalized shallow LeNet architecture, consisting of only five layers, decay as a power law with the number of filters in the first convolutional layer.
A power law with a similar exponent also characterizes the generalized VGG-16 architecture.
Conservation law along the convolutional layers, which is the square-root of their size times their depth, is found to minimize error rates.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The realization of complex classification tasks requires training of deep
learning (DL) architectures consisting of tens or even hundreds of
convolutional and fully connected hidden layers, which is far from the reality
of the human brain. According to the DL rationale, the first convolutional
layer reveals localized patterns in the input and large-scale patterns in the
following layers, until it reliably characterizes a class of inputs. Here, we
demonstrate that with a fixed ratio between the depths of the first and second
convolutional layers, the error rates of the generalized shallow LeNet
architecture, consisting of only five layers, decay as a power law with the
number of filters in the first convolutional layer. The extrapolation of this
power law indicates that the generalized LeNet can achieve small error rates
that were previously obtained for the CIFAR-10 database using DL architectures.
A power law with a similar exponent also characterizes the generalized VGG-16
architecture. However, this results in a significantly increased number of
operations required to achieve a given error rate with respect to LeNet. This
power law phenomenon governs various generalized LeNet and VGG-16
architectures, hinting at its universal behavior and suggesting a quantitative
hierarchical time-space complexity among machine learning architectures.
Additionally, the conservation law along the convolutional layers, which is the
square-root of their size times their depth, is found to asymptotically
minimize error rates. The efficient shallow learning that is demonstrated in
this study calls for further quantitative examination using various databases
and architectures and its accelerated implementation using future dedicated
hardware developments.
Related papers
- EM-DARTS: Hierarchical Differentiable Architecture Search for Eye Movement Recognition [54.99121380536659]
Eye movement biometrics have received increasing attention thanks to its high secure identification.
Deep learning (DL) models have been recently successfully applied for eye movement recognition.
DL architecture still is determined by human prior knowledge.
We propose EM-DARTS, a hierarchical differentiable architecture search algorithm to automatically design the DL architecture for eye movement recognition.
arXiv Detail & Related papers (2024-09-22T13:11:08Z) - A Law of Data Separation in Deep Learning [41.58856318262069]
We study the fundamental question of how deep neural networks process data in the intermediate layers.
Our finding is a simple and quantitative law that governs how deep neural networks separate data according to class membership.
arXiv Detail & Related papers (2022-10-31T02:25:38Z) - FlowNAS: Neural Architecture Search for Optical Flow Estimation [65.44079917247369]
We propose a neural architecture search method named FlowNAS to automatically find the better encoder architecture for flow estimation task.
Experimental results show that the discovered architecture with the weights inherited from the super-network achieves 4.67% F1-all error on KITTI.
arXiv Detail & Related papers (2022-07-04T09:05:25Z) - Stacked unsupervised learning with a network architecture found by
supervised meta-learning [4.209801809583906]
Stacked unsupervised learning seems more biologically plausible than backpropagation.
But SUL has fallen far short of backpropagation in practical applications.
We show an SUL algorithm that can perform completely unsupervised clustering of MNIST digits.
arXiv Detail & Related papers (2022-06-06T16:17:20Z) - Dual-constrained Deep Semi-Supervised Coupled Factorization Network with
Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net.
To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network.
Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z) - Stage-Wise Neural Architecture Search [65.03109178056937]
Modern convolutional networks such as ResNet and NASNet have achieved state-of-the-art results in many computer vision applications.
These networks consist of stages, which are sets of layers that operate on representations in the same resolution.
It has been demonstrated that increasing the number of layers in each stage improves the prediction ability of the network.
However, the resulting architecture becomes computationally expensive in terms of floating point operations, memory requirements and inference time.
arXiv Detail & Related papers (2020-04-23T14:16:39Z) - When Residual Learning Meets Dense Aggregation: Rethinking the
Aggregation of Deep Neural Networks [57.0502745301132]
We propose Micro-Dense Nets, a novel architecture with global residual learning and local micro-dense aggregations.
Our micro-dense block can be integrated with neural architecture search based models to boost their performance.
arXiv Detail & Related papers (2020-04-19T08:34:52Z) - Introducing Fuzzy Layers for Deep Learning [5.209583609264815]
We introduce a new layer to deep learning: the fuzzy layer.
Traditionally, the network architecture of neural networks is composed of an input layer, some combination of hidden layers, and an output layer.
We propose the introduction of fuzzy layers into the deep learning architecture to exploit the powerful aggregation properties expressed through fuzzy methodologies.
arXiv Detail & Related papers (2020-02-21T19:33:30Z) - Convolutional Networks with Dense Connectivity [59.30634544498946]
We introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion.
For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers.
We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks.
arXiv Detail & Related papers (2020-01-08T06:54:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.