Neuroevolution-Enhanced Multi-Objective Optimization for Mixed-Precision
Quantization
- URL: http://arxiv.org/abs/2106.07611v1
- Date: Mon, 14 Jun 2021 17:15:15 GMT
- Title: Neuroevolution-Enhanced Multi-Objective Optimization for Mixed-Precision
Quantization
- Authors: Santiago Miret, Vui Seng Chua, Mattias Marder, Mariano Phielipp,
Nilesh Jain, Somdeb Majumdar
- Abstract summary: Mixed-precision quantization is a powerful tool to enable memory and compute savings of neural network workloads.
Recent research has shown significant progress in applying mixed-precision quantization techniques.
We present a flexible and scalable framework for automated mixed-precision quantization.
- Score: 6.060757543617328
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mixed-precision quantization is a powerful tool to enable memory and compute
savings of neural network workloads by deploying different sets of bit-width
precisions on separate compute operations. Recent research has shown
significant progress in applying mixed-precision quantization techniques to
reduce the memory footprint of various workloads, while also preserving task
performance. Prior work, however, has often ignored additional objectives, such
as bit-operations, that are important for deployment of workloads on hardware.
Here we present a flexible and scalable framework for automated mixed-precision
quantization that optimizes multiple objectives. Our framework relies on
Neuroevolution-Enhanced Multi-Objective Optimization (NEMO), a novel search
method, to find Pareto optimal mixed-precision configurations for memory and
bit-operations objectives. Within NEMO, a population is divided into
structurally distinct sub-populations (species) which jointly form the Pareto
frontier of solutions for the multi-objective problem. At each generation,
species are re-sized in proportion to the goodness of their contribution to the
Pareto frontier. This allows NEMO to leverage established search techniques and
neuroevolution methods to continually improve the goodness of the Pareto
frontier. In our experiments we apply a graph-based representation to describe
the underlying workload, enabling us to deploy graph neural networks trained by
NEMO to find Pareto optimal configurations for various workloads trained on
ImageNet. Compared to the state-of-the-art, we achieve competitive results on
memory compression and superior results for compute compression for
MobileNet-V2, ResNet50 and ResNeXt-101-32x8d. A deeper analysis of the results
obtained by NEMO also shows that both the graph representation and the
species-based approach are critical in finding effective configurations for all
workloads.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - POMONAG: Pareto-Optimal Many-Objective Neural Architecture Generator [4.09225917049674]
Transferable NAS has emerged, generalizing the search process from dataset-dependent to task-dependent.
This paper introduces POMONAG, extending DiffusionNAG via a many-optimal diffusion process.
Results were validated on two search spaces -- NAS201 and MobileNetV3 -- and evaluated across 15 image classification datasets.
arXiv Detail & Related papers (2024-09-30T16:05:29Z) - Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost.
We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion.
By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z) - MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation [80.47072100963017]
We introduce a novel and low-compute algorithm, Model Merging with Amortized Pareto Front (MAP)
MAP efficiently identifies a set of scaling coefficients for merging multiple models, reflecting the trade-offs involved.
We also introduce Bayesian MAP for scenarios with a relatively low number of tasks and Nested MAP for situations with a high number of tasks, further reducing the computational cost of evaluation.
arXiv Detail & Related papers (2024-06-11T17:55:25Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - Consolidated learning -- a domain-specific model-free optimization
strategy with examples for XGBoost and MIMIC-IV [4.370097023410272]
This paper proposes a new formulation of the tuning problem, called consolidated learning.
In such settings, we are interested in the total optimization time rather than tuning for a single task.
We demonstrate the effectiveness of this approach through an empirical study for XGBoost algorithm and the collection of predictive tasks extracted from the MIMIC-IV medical database.
arXiv Detail & Related papers (2022-01-27T21:38:53Z) - Video Coding for Machine: Compact Visual Representation Compression for
Intelligent Collaborative Analytics [101.35754364753409]
Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression.
This paper summarizes VCM methodology and philosophy based on existing academia and industrial efforts.
arXiv Detail & Related papers (2021-10-18T12:42:13Z) - ECQ$^{\text{x}}$: Explainability-Driven Quantization for Low-Bit and
Sparse DNNs [13.446502051609036]
We develop and describe a novel quantization paradigm for deep neural networks (DNNs)
Our method leverages concepts of explainable AI (XAI) and concepts of information theory.
The ultimate goal is to preserve the most relevant weights in quantization clusters of highest information content.
arXiv Detail & Related papers (2021-09-09T12:57:06Z) - Ensembles of Spiking Neural Networks [0.3007949058551534]
This paper demonstrates how to construct ensembles of spiking neural networks producing state-of-the-art results.
We achieve classification accuracies of 98.71%, 100.0%, and 99.09%, on the MNIST, NMNIST and DVS Gesture datasets respectively.
We formalize spiking neural networks as GLM predictors, identifying a suitable representation for their target domain.
arXiv Detail & Related papers (2020-10-15T17:45:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.