SimQ-NAS: Simultaneous Quantization Policy and Neural Architecture
Search
- URL: http://arxiv.org/abs/2312.13301v1
- Date: Tue, 19 Dec 2023 22:08:49 GMT
- Title: SimQ-NAS: Simultaneous Quantization Policy and Neural Architecture
Search
- Authors: Sharath Nittur Sridhar, Maciej Szankin, Fang Chen, Sairam Sundaresan,
Anthony Sarah
- Abstract summary: Recent one-shot Neural Architecture Search algorithms rely on training a hardware-agnostic super-network tailored to a specific task and then extracting efficient sub-networks for different hardware platforms.
We show that by using multi-objective search algorithms paired with lightly trained predictors, we can efficiently search for both the sub-network architecture and the corresponding quantization policy.
- Score: 6.121126813817338
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent one-shot Neural Architecture Search algorithms rely on training a
hardware-agnostic super-network tailored to a specific task and then extracting
efficient sub-networks for different hardware platforms. Popular approaches
separate the training of super-networks from the search for sub-networks, often
employing predictors to alleviate the computational overhead associated with
search. Additionally, certain methods also incorporate the quantization policy
within the search space. However, while the quantization policy search for
convolutional neural networks is well studied, the extension of these methods
to transformers and especially foundation models remains under-explored. In
this paper, we demonstrate that by using multi-objective search algorithms
paired with lightly trained predictors, we can efficiently search for both the
sub-network architecture and the corresponding quantization policy and
outperform their respective baselines across different performance objectives
such as accuracy, model size, and latency. Specifically, we demonstrate that
our approach performs well across both uni-modal (ViT and BERT) and multi-modal
(BEiT-3) transformer-based architectures as well as convolutional architectures
(ResNet). For certain networks, we demonstrate an improvement of up to $4.80x$
and $3.44x$ for latency and model size respectively, without degradation in
accuracy compared to the fully quantized INT8 baselines.
Related papers
- OFA$^2$: A Multi-Objective Perspective for the Once-for-All Neural
Architecture Search [79.36688444492405]
Once-for-All (OFA) is a Neural Architecture Search (NAS) framework designed to address the problem of searching efficient architectures for devices with different resources constraints.
We aim to give one step further in the search for efficiency by explicitly conceiving the search stage as a multi-objective optimization problem.
arXiv Detail & Related papers (2023-03-23T21:30:29Z) - Dynamic Neural Network for Multi-Task Learning Searching across Diverse
Network Topologies [14.574399133024594]
We present a new MTL framework that searches for optimized structures for multiple tasks with diverse graph topologies.
We design a restricted DAG-based central network with read-in/read-out layers to build topologically diverse task-adaptive structures.
arXiv Detail & Related papers (2023-03-13T05:01:50Z) - Tricks and Plugins to GBM on Images and Sequences [18.939336393665553]
We propose a new algorithm for boosting Deep Convolutional Neural Networks (BoostCNN) to combine the merits of dynamic feature selection and BoostCNN.
We also propose a set of algorithms to incorporate boosting weights into a deep learning architecture based on a least squares objective function.
Experiments show that the proposed methods outperform benchmarks on several fine-grained classification tasks.
arXiv Detail & Related papers (2022-03-01T21:59:00Z) - Once Quantization-Aware Training: High Performance Extremely Low-bit
Architecture Search [112.05977301976613]
We propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides.
We first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models.
Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and improves the quantization accuracy.
arXiv Detail & Related papers (2020-10-09T03:52:16Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z) - A Progressive Sub-Network Searching Framework for Dynamic Inference [33.93841415140311]
We propose a progressive sub-net searching framework, which is embedded with several effective techniques, including trainable noise ranking, channel group and fine-tuning threshold setting, sub-nets re-selection.
Our proposed method achieves much better dynamic inference accuracy compared with prior popular Universally-Slimmable-Network by 4.4%-maximally and 2.3%-averagely in ImageNet dataset with the same model size.
arXiv Detail & Related papers (2020-09-11T22:56:02Z) - DC-NAS: Divide-and-Conquer Neural Architecture Search [108.57785531758076]
We present a divide-and-conquer (DC) approach to effectively and efficiently search deep neural architectures.
We achieve a $75.1%$ top-1 accuracy on the ImageNet dataset, which is higher than that of state-of-the-art methods using the same search space.
arXiv Detail & Related papers (2020-05-29T09:02:16Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z) - MTL-NAS: Task-Agnostic Neural Architecture Search towards
General-Purpose Multi-Task Learning [71.90902837008278]
We propose to incorporate neural architecture search (NAS) into general-purpose multi-task learning (GP-MTL)
In order to adapt to different task combinations, we disentangle the GP-MTL networks into single-task backbones.
We also propose a novel single-shot gradient-based search algorithm that closes the performance gap between the searched architectures.
arXiv Detail & Related papers (2020-03-31T09:49:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.