Neuroevolution-Enhanced Multi-Objective Optimization for Mixed-Precision
Quantization
- URL: http://arxiv.org/abs/2106.07611v1
- Date: Mon, 14 Jun 2021 17:15:15 GMT
- Title: Neuroevolution-Enhanced Multi-Objective Optimization for Mixed-Precision
Quantization
- Authors: Santiago Miret, Vui Seng Chua, Mattias Marder, Mariano Phielipp,
Nilesh Jain, Somdeb Majumdar
- Abstract summary: Mixed-precision quantization is a powerful tool to enable memory and compute savings of neural network workloads.
Recent research has shown significant progress in applying mixed-precision quantization techniques.
We present a flexible and scalable framework for automated mixed-precision quantization.
- Score: 6.060757543617328
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mixed-precision quantization is a powerful tool to enable memory and compute
savings of neural network workloads by deploying different sets of bit-width
precisions on separate compute operations. Recent research has shown
significant progress in applying mixed-precision quantization techniques to
reduce the memory footprint of various workloads, while also preserving task
performance. Prior work, however, has often ignored additional objectives, such
as bit-operations, that are important for deployment of workloads on hardware.
Here we present a flexible and scalable framework for automated mixed-precision
quantization that optimizes multiple objectives. Our framework relies on
Neuroevolution-Enhanced Multi-Objective Optimization (NEMO), a novel search
method, to find Pareto optimal mixed-precision configurations for memory and
bit-operations objectives. Within NEMO, a population is divided into
structurally distinct sub-populations (species) which jointly form the Pareto
frontier of solutions for the multi-objective problem. At each generation,
species are re-sized in proportion to the goodness of their contribution to the
Pareto frontier. This allows NEMO to leverage established search techniques and
neuroevolution methods to continually improve the goodness of the Pareto
frontier. In our experiments we apply a graph-based representation to describe
the underlying workload, enabling us to deploy graph neural networks trained by
NEMO to find Pareto optimal configurations for various workloads trained on
ImageNet. Compared to the state-of-the-art, we achieve competitive results on
memory compression and superior results for compute compression for
MobileNet-V2, ResNet50 and ResNeXt-101-32x8d. A deeper analysis of the results
obtained by NEMO also shows that both the graph representation and the
species-based approach are critical in finding effective configurations for all
workloads.
Related papers
- Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost.
We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion.
By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Vertical Layering of Quantized Neural Networks for Heterogeneous
Inference [57.42762335081385]
We study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one.
We can theoretically achieve any precision network for on-demand service while only needing to train and maintain one model.
arXiv Detail & Related papers (2022-12-10T15:57:38Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - Channel-wise Mixed-precision Assignment for DNN Inference on Constrained
Edge Nodes [22.40937602825472]
State-of-the-art mixed-precision works layer-wise, i.e., it uses different bit-widths for the weights and activations tensors of each network layer.
We propose a novel NAS that selects the bit-width of each weight tensor channel independently.
Our networks reduce the memory and energy for inference by up to 63% and 27% respectively.
arXiv Detail & Related papers (2022-06-17T15:51:49Z) - Consolidated learning -- a domain-specific model-free optimization
strategy with examples for XGBoost and MIMIC-IV [4.370097023410272]
This paper proposes a new formulation of the tuning problem, called consolidated learning.
In such settings, we are interested in the total optimization time rather than tuning for a single task.
We demonstrate the effectiveness of this approach through an empirical study for XGBoost algorithm and the collection of predictive tasks extracted from the MIMIC-IV medical database.
arXiv Detail & Related papers (2022-01-27T21:38:53Z) - Video Coding for Machine: Compact Visual Representation Compression for
Intelligent Collaborative Analytics [101.35754364753409]
Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression.
This paper summarizes VCM methodology and philosophy based on existing academia and industrial efforts.
arXiv Detail & Related papers (2021-10-18T12:42:13Z) - ECQ$^{\text{x}}$: Explainability-Driven Quantization for Low-Bit and
Sparse DNNs [13.446502051609036]
We develop and describe a novel quantization paradigm for deep neural networks (DNNs)
Our method leverages concepts of explainable AI (XAI) and concepts of information theory.
The ultimate goal is to preserve the most relevant weights in quantization clusters of highest information content.
arXiv Detail & Related papers (2021-09-09T12:57:06Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Ensembles of Spiking Neural Networks [0.3007949058551534]
This paper demonstrates how to construct ensembles of spiking neural networks producing state-of-the-art results.
We achieve classification accuracies of 98.71%, 100.0%, and 99.09%, on the MNIST, NMNIST and DVS Gesture datasets respectively.
We formalize spiking neural networks as GLM predictors, identifying a suitable representation for their target domain.
arXiv Detail & Related papers (2020-10-15T17:45:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.