Related papers: The mechanism underlying successful deep learning

The mechanism underlying successful deep learning

URL: http://arxiv.org/abs/2305.18078v1
Date: Mon, 29 May 2023 13:28:43 GMT
Title: The mechanism underlying successful deep learning
Authors: Yarden Tzach, Yuval Meir, Ofek Tevet, Ronit D. Gross, Shiri Hodassman, Roni Vardi and Ido Kanter
Abstract summary: This article presents an efficient three-phase procedure for quantifying the mechanism underlying successful deep learning (DL) First, a deep architecture is trained to the success rate (SR) Next, the weights of the first several CLs are fixed and only the new FC layer connected to the output is trained, resulting in SRs that progress with the layers. Finally, the trained FC weights are silenced, except for those emerging from a single filter.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep architectures consist of tens or hundreds of convolutional layers (CLs) that terminate with a few fully connected (FC) layers and an output layer representing the possible labels of a complex classification task. According to the existing deep learning (DL) rationale, the first CL reveals localized features from the raw data, whereas the subsequent layers progressively extract higher-level features required for refined classification. This article presents an efficient three-phase procedure for quantifying the mechanism underlying successful DL. First, a deep architecture is trained to maximize the success rate (SR). Next, the weights of the first several CLs are fixed and only the concatenated new FC layer connected to the output is trained, resulting in SRs that progress with the layers. Finally, the trained FC weights are silenced, except for those emerging from a single filter, enabling the quantification of the functionality of this filter using a correlation matrix between input labels and averaged output fields, hence a well-defined set of quantifiable features is obtained. Each filter essentially selects a single output label independent of the input label, which seems to prevent high SRs; however, it counterintuitively identifies a small subset of possible output labels. This feature is an essential part of the underlying DL mechanism and is progressively sharpened with layers, resulting in enhanced signal-to-noise ratios and SRs. Quantitatively, this mechanism is exemplified by the VGG-16, VGG-6, and AVGG-16. The proposed mechanism underlying DL provides an accurate tool for identifying each filter's quality and is expected to direct additional procedures to improve the SR, computational complexity, and latency of DL.

Related papers

Unified CNNs and transformers underlying learning mechanism reveals multi-head attention modus vivendi [0.0]
Convolutional neural networks (CNNs) evaluate short-range correlations in input images which progress along the layers. vision transformer (ViT) architectures evaluate long-range correlations, using repeated transformer encoders composed of fully connected layers. This study demonstrates that CNNs and ViT architectures stem from a unified underlying learning mechanism.
arXiv Detail & Related papers (2025-01-22T14:19:48Z)
LayerMatch: Do Pseudo-labels Benefit All Layers? [77.59625180366115]
Semi-supervised learning offers a promising solution to mitigate the dependency of labeled data. We develop two layer-specific pseudo-label strategies, termed Grad-ReLU and Avg-Clustering. Our approach consistently demonstrates exceptional performance on standard semi-supervised learning benchmarks.
arXiv Detail & Related papers (2024-06-20T11:25:50Z)
Towards a universal mechanism for successful deep learning [0.0]
This study shows that the accuracy and SNR progressively increase with the layers. For a given deep architecture, the maximal error rate increases approximately linearly with the number of output labels. Similar trends were obtained for dataset labels in the range [3, 1,000], thus supporting the universality of this mechanism.
arXiv Detail & Related papers (2023-09-14T09:03:57Z)
Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task. A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks. Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z)
Focus Your Attention (with Adaptive IIR Filters) [62.80628327613344]
We present a new layer in which dynamic (i.e.,input-dependent) Infinite Impulse Response (IIR) filters of order two are used to process the input sequence. Despite their relatively low order, the causal adaptive filters are shown to focus attention on the relevant sequence elements.
arXiv Detail & Related papers (2023-05-24T09:42:30Z)
Basis Scaling and Double Pruning for Efficient Inference in Network-Based Transfer Learning [1.3467579878240454]
We decompose a convolutional layer into two layers: a convolutional layer with the orthonormal basis vectors as the filters, and a "BasisScalingConv" layer which is responsible for rescaling the features. We can achieve pruning ratios up to 74.6% for CIFAR-10 and 98.9% for MNIST in model parameters.
arXiv Detail & Related papers (2021-08-06T00:04:02Z)
A Hierarchical Coding Scheme for Glasses-free 3D Displays Based on Scalable Hybrid Layered Representation of Real-World Light Fields [0.6091702876917279]
Scheme learns stacked multiplicative layers from subsets of light field views determined from different scanning orders. The spatial correlation in layer patterns is exploited with varying low ranks in factorization derived from singular value decomposition on a Krylov subspace. encoding with HEVC efficiently removes intra-view and inter-view correlation in low-rank approximated layers.
arXiv Detail & Related papers (2021-04-19T15:09:21Z)
An evidential classifier based on Dempster-Shafer theory and deep learning [6.230751621285322]
We propose a new classification system based on Dempster-Shafer (DS) theory and a convolutional neural network (CNN) architecture for set-valued classification. Experiments on image recognition, signal processing, and semantic-relationship classification tasks demonstrate that the proposed combination of deep CNN, DS layer, and expected utility layer makes it possible to improve classification accuracy.
arXiv Detail & Related papers (2021-03-25T01:29:05Z)
Dual-constrained Deep Semi-Supervised Coupled Factorization Network with Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net. To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network. Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z)
Novel Adaptive Binary Search Strategy-First Hybrid Pyramid- and Clustering-Based CNN Filter Pruning Method without Parameters Setting [3.7468898363447654]
Pruning redundant filters in CNN models has received growing attention. We propose an adaptive binary search-first hybrid pyramid- and clustering-based (ABS HPC) method for pruning filters automatically. Based on the practical dataset and the CNN models, with higher accuracy, the thorough experimental results demonstrated the significant parameters and floating-point operations reduction merits of the proposed filter pruning method.
arXiv Detail & Related papers (2020-06-08T10:09:43Z)
Ensemble Wrapper Subsampling for Deep Modulation Classification [70.91089216571035]
Subsampling of received wireless signals is important for relaxing hardware requirements as well as the computational cost of signal processing algorithms. We propose a subsampling technique to facilitate the use of deep learning for automatic modulation classification in wireless communication systems.
arXiv Detail & Related papers (2020-05-10T06:11:13Z)
Dependency Aware Filter Pruning [74.69495455411987]
Pruning a proportion of unimportant filters is an efficient way to mitigate the inference cost. Previous work prunes filters according to their weight norms or the corresponding batch-norm scaling factors. We propose a novel mechanism to dynamically control the sparsity-inducing regularization so as to achieve the desired sparsity.
arXiv Detail & Related papers (2020-05-06T07:41:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.