Mitigating severe over-parameterization in deep convolutional neural
networks through forced feature abstraction and compression with an
entropy-based heuristic
- URL: http://arxiv.org/abs/2106.14190v1
- Date: Sun, 27 Jun 2021 10:34:39 GMT
- Title: Mitigating severe over-parameterization in deep convolutional neural
networks through forced feature abstraction and compression with an
entropy-based heuristic
- Authors: Nidhi Gowdra, Roopak Sinha, Stephen MacDonell and Wei Qi Yan
- Abstract summary: We propose an Entropy-Based Convolutional Layer Estimation (EBCLE) which is robust and simple.
We present empirical evidence to emphasize the relative effectiveness of broader, yet shallower models trained using the EBCLE.
- Score: 7.503338065129185
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional Neural Networks (CNNs) such as ResNet-50, DenseNet-40 and
ResNeXt-56 are severely over-parameterized, necessitating a consequent increase
in the computational resources required for model training which scales
exponentially for increments in model depth. In this paper, we propose an
Entropy-Based Convolutional Layer Estimation (EBCLE) heuristic which is robust
and simple, yet effective in resolving the problem of over-parameterization
with regards to network depth of CNN model. The EBCLE heuristic employs a
priori knowledge of the entropic data distribution of input datasets to
determine an upper bound for convolutional network depth, beyond which identity
transformations are prevalent offering insignificant contributions for
enhancing model performance. Restricting depth redundancies by forcing feature
compression and abstraction restricts over-parameterization while decreasing
training time by 24.99% - 78.59% without degradation in model performance. We
present empirical evidence to emphasize the relative effectiveness of broader,
yet shallower models trained using the EBCLE heuristic, which maintains or
outperforms baseline classification accuracies of narrower yet deeper models.
The EBCLE heuristic is architecturally agnostic and EBCLE based CNN models
restrict depth redundancies resulting in enhanced utilization of the available
computational resources. The proposed EBCLE heuristic is a compelling technique
for researchers to analytically justify their HyperParameter (HP) choices for
CNNs. Empirical validation of the EBCLE heuristic in training CNN models was
established on five benchmarking datasets (ImageNet32, CIFAR-10/100, STL-10,
MNIST) and four network architectures (DenseNet, ResNet, ResNeXt and
EfficientNet B0-B2) with appropriate statistical tests employed to infer any
conclusive claims presented in this paper.
Related papers
- Bayesian Entropy Neural Networks for Physics-Aware Prediction [14.705526856205454]
We introduce BENN, a framework designed to impose constraints on Bayesian Neural Network (BNN) predictions.
Benn is capable of constraining not only the predicted values but also their derivatives and variances, ensuring a more robust and reliable model output.
Results highlight significant improvements over traditional BNNs and showcase competitive performance relative to contemporary constrained deep learning methods.
arXiv Detail & Related papers (2024-07-01T07:00:44Z) - IB-AdCSCNet:Adaptive Convolutional Sparse Coding Network Driven by Information Bottleneck [4.523653503622693]
We introduce IB-AdCSCNet, a deep learning model grounded in information bottleneck theory.
IB-AdCSCNet seamlessly integrates the information bottleneck trade-off strategy into deep networks.
Experimental results on CIFAR-10 and CIFAR-100 datasets demonstrate that IB-AdCSCNet not only matches the performance of deep residual convolutional networks but also outperforms them when handling corrupted data.
arXiv Detail & Related papers (2024-05-23T05:35:57Z) - Spike-and-slab shrinkage priors for structurally sparse Bayesian neural networks [0.16385815610837165]
Sparse deep learning addresses challenges by recovering a sparse representation of the underlying target function.
Deep neural architectures compressed via structured sparsity provide low latency inference, higher data throughput, and reduced energy consumption.
We propose structurally sparse Bayesian neural networks which prune excessive nodes with (i) Spike-and-Slab Group Lasso (SS-GL), and (ii) Spike-and-Slab Group Horseshoe (SS-GHS) priors.
arXiv Detail & Related papers (2023-08-17T17:14:18Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - ECQ$^{\text{x}}$: Explainability-Driven Quantization for Low-Bit and
Sparse DNNs [13.446502051609036]
We develop and describe a novel quantization paradigm for deep neural networks (DNNs)
Our method leverages concepts of explainable AI (XAI) and concepts of information theory.
The ultimate goal is to preserve the most relevant weights in quantization clusters of highest information content.
arXiv Detail & Related papers (2021-09-09T12:57:06Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - A High-Performance Adaptive Quantization Approach for Edge CNN
Applications [0.225596179391365]
Recent convolutional neural network (CNN) development continues to advance the state-of-the-art model accuracy for various applications.
The enhanced accuracy comes at the cost of substantial memory bandwidth and storage requirements.
In this paper, we introduce an adaptive high-performance quantization method to resolve the issue of biased activation.
arXiv Detail & Related papers (2021-07-18T07:49:18Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.