Sensitivity-Aware Mixed-Precision Quantization and Width Optimization of
Deep Neural Networks Through Cluster-Based Tree-Structured Parzen Estimation
- URL: http://arxiv.org/abs/2308.06422v2
- Date: Wed, 16 Aug 2023 16:18:28 GMT
- Title: Sensitivity-Aware Mixed-Precision Quantization and Width Optimization of
Deep Neural Networks Through Cluster-Based Tree-Structured Parzen Estimation
- Authors: Seyedarmin Azizi, Mahdi Nazemi, Arash Fayyazi, Massoud Pedram
- Abstract summary: We introduce an innovative search mechanism for automatically selecting the best bit-width and layer-width for individual neural network layers.
This leads to a marked enhancement in deep neural network efficiency.
- Score: 5.187866263931125
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As the complexity and computational demands of deep learning models rise, the
need for effective optimization methods for neural network designs becomes
paramount. This work introduces an innovative search mechanism for
automatically selecting the best bit-width and layer-width for individual
neural network layers. This leads to a marked enhancement in deep neural
network efficiency. The search domain is strategically reduced by leveraging
Hessian-based pruning, ensuring the removal of non-crucial parameters.
Subsequently, we detail the development of surrogate models for favorable and
unfavorable outcomes by employing a cluster-based tree-structured Parzen
estimator. This strategy allows for a streamlined exploration of architectural
possibilities and swift pinpointing of top-performing designs. Through rigorous
testing on well-known datasets, our method proves its distinct advantage over
existing methods. Compared to leading compression strategies, our approach
records an impressive 20% decrease in model size without compromising accuracy.
Additionally, our method boasts a 12x reduction in search time relative to the
best search-focused strategies currently available. As a result, our proposed
method represents a leap forward in neural network design optimization, paving
the way for quick model design and implementation in settings with limited
resources, thereby propelling the potential of scalable deep learning
solutions.
Related papers
- Robust Neural Pruning with Gradient Sampling Optimization for Residual Neural Networks [0.0]
This research embarks on pioneering the integration of gradient sampling optimization techniques, particularly StochGradAdam, into the pruning process of neural networks.
Our main objective is to address the significant challenge of maintaining accuracy in pruned neural models, critical in resource-constrained scenarios.
arXiv Detail & Related papers (2023-12-26T12:19:22Z) - Visual Prompting Upgrades Neural Network Sparsification: A Data-Model
Perspective [67.25782152459851]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity.
Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z) - Neural Network Pruning by Gradient Descent [7.427858344638741]
We introduce a novel and straightforward neural network pruning framework that incorporates the Gumbel-Softmax technique.
We demonstrate its exceptional compression capability, maintaining high accuracy on the MNIST dataset with only 0.15% of the original network parameters.
We believe our method opens a promising new avenue for deep learning pruning and the creation of interpretable machine learning systems.
arXiv Detail & Related papers (2023-11-21T11:12:03Z) - An automatic selection of optimal recurrent neural network architecture
for processes dynamics modelling purposes [0.0]
The research has included four original proposals of algorithms dedicated to neural network architecture search.
Algorithms have been based on well-known optimisation techniques such as evolutionary algorithms and gradient descent methods.
The research involved an extended validation study based on data generated from a mathematical model of the fast processes occurring in a pressurised water nuclear reactor.
arXiv Detail & Related papers (2023-09-25T11:06:35Z) - Split-Boost Neural Networks [1.1549572298362787]
We propose an innovative training strategy for feed-forward architectures - called split-boost.
Such a novel approach ultimately allows us to avoid explicitly modeling the regularization term.
The proposed strategy is tested on a real-world (anonymized) dataset within a benchmark medical insurance design problem.
arXiv Detail & Related papers (2023-09-06T17:08:57Z) - Online Network Source Optimization with Graph-Kernel MAB [62.6067511147939]
We propose Grab-UCB, a graph- kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks.
We describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations.
We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy.
arXiv Detail & Related papers (2023-07-07T15:03:42Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - Neural Architecture Search for Speech Emotion Recognition [72.1966266171951]
We propose to apply neural architecture search (NAS) techniques to automatically configure the SER models.
We show that NAS can improve SER performance (54.89% to 56.28%) while maintaining model parameter sizes.
arXiv Detail & Related papers (2022-03-31T10:16:10Z) - Analytically Tractable Inference in Deep Neural Networks [0.0]
Tractable Approximate Inference (TAGI) algorithm was shown to be a viable and scalable alternative to backpropagation for shallow fully-connected neural networks.
We are demonstrating how TAGI matches or exceeds the performance of backpropagation, for training classic deep neural network architectures.
arXiv Detail & Related papers (2021-03-09T14:51:34Z) - Firefly Neural Architecture Descent: a General Approach for Growing
Neural Networks [50.684661759340145]
Firefly neural architecture descent is a general framework for progressively and dynamically growing neural networks.
We show that firefly descent can flexibly grow networks both wider and deeper, and can be applied to learn accurate but resource-efficient neural architectures.
In particular, it learns networks that are smaller in size but have higher average accuracy than those learned by the state-of-the-art methods.
arXiv Detail & Related papers (2021-02-17T04:47:18Z) - FactorizeNet: Progressive Depth Factorization for Efficient Network
Architecture Exploration Under Quantization Constraints [93.4221402881609]
We introduce a progressive depth factorization strategy for efficient CNN architecture exploration under quantization constraints.
By algorithmically increasing the granularity of depth factorization in a progressive manner, the proposed strategy enables a fine-grained, low-level analysis of layer-wise distributions.
Such a progressive depth factorization strategy also enables efficient identification of the optimal depth-factorized macroarchitecture design.
arXiv Detail & Related papers (2020-11-30T07:12:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.