AlphaNet: Improved Training of Supernet with Alpha-Divergence
- URL: http://arxiv.org/abs/2102.07954v1
- Date: Tue, 16 Feb 2021 04:23:55 GMT
- Title: AlphaNet: Improved Training of Supernet with Alpha-Divergence
- Authors: Dilin Wang, Chengyue Gong, Meng Li, Qiang Liu, Vikas Chandra
- Abstract summary: We propose to improve the supernet training with a more generalized alpha-divergence.
We apply the proposed alpha-divergence based supernet training to both slimmable neural networks and weight-sharing NAS.
Specifically, our discovered model family, AlphaNet, outperforms prior-art models on a wide range of FLOPs regimes.
- Score: 28.171262066145616
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Weight-sharing neural architecture search (NAS) is an effective technique for
automating efficient neural architecture design. Weight-sharing NAS builds a
supernet that assembles all the architectures as its sub-networks and jointly
trains the supernet with the sub-networks. The success of weight-sharing NAS
heavily relies on distilling the knowledge of the supernet to the sub-networks.
However, we find that the widely used distillation divergence, i.e., KL
divergence, may lead to student sub-networks that over-estimate or
under-estimate the uncertainty of the teacher supernet, leading to inferior
performance of the sub-networks. In this work, we propose to improve the
supernet training with a more generalized alpha-divergence. By adaptively
selecting the alpha-divergence, we simultaneously prevent the over-estimation
or under-estimation of the uncertainty of the teacher model. We apply the
proposed alpha-divergence based supernet training to both slimmable neural
networks and weight-sharing NAS, and demonstrate significant improvements.
Specifically, our discovered model family, AlphaNet, outperforms prior-art
models on a wide range of FLOPs regimes, including BigNAS, Once-for-All
networks, FBNetV3, and AttentiveNAS. We achieve ImageNet top-1 accuracy of
80.0% with only 444 MFLOPs.
Related papers
- SalNAS: Efficient Saliency-prediction Neural Architecture Search with self-knowledge distillation [7.625269122161064]
Recent advancements in deep convolutional neural networks have significantly improved the performance of saliency prediction.
We propose a new Neural Architecture Search framework for saliency prediction with two contributions.
By utilizing Self-KD, SalNAS outperforms other state-of-the-art saliency prediction models in most evaluation rubrics.
arXiv Detail & Related papers (2024-07-29T14:48:34Z) - DNA Family: Boosting Weight-Sharing NAS with Block-Wise Supervisions [121.05720140641189]
We develop a family of models with the distilling neural architecture (DNA) techniques.
Our proposed DNA models can rate all architecture candidates, as opposed to previous works that can only access a sub- search space using algorithms.
Our models achieve state-of-the-art top-1 accuracy of 78.9% and 83.6% on ImageNet for a mobile convolutional network and a small vision transformer, respectively.
arXiv Detail & Related papers (2024-03-02T22:16:47Z) - Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts [55.470959564665705]
Weight-sharing supernets are crucial for performance estimation in cutting-edge neural search frameworks.
The proposed method attains state-of-the-art (SoTA) performance in NAS for fast machine translation models.
It excels in NAS for building memory-efficient task-agnostic BERT models.
arXiv Detail & Related papers (2023-06-08T00:35:36Z) - Prior-Guided One-shot Neural Architecture Search [11.609732776776982]
We present Prior-Guided One-shot NAS (PGONAS) to strengthen the ranking correlation of supernets.
Our PGONAS ranks 3rd place in the supernet Track Track of CVPR2022 Second lightweight NAS challenge.
arXiv Detail & Related papers (2022-06-27T14:19:56Z) - Evolutionary Neural Cascade Search across Supernetworks [68.8204255655161]
We introduce ENCAS - Evolutionary Neural Cascade Search.
ENCAS can be used to search over multiple pretrained supernetworks.
We test ENCAS on common computer vision benchmarks.
arXiv Detail & Related papers (2022-03-08T11:06:01Z) - Enabling NAS with Automated Super-Network Generation [60.72821429802335]
Recent Neural Architecture Search (NAS) solutions have produced impressive results training super-networks and then derivingworks.
We present BootstrapNAS, a software framework for automatic generation of super-networks for NAS.
arXiv Detail & Related papers (2021-12-20T21:45:48Z) - An Analysis of Super-Net Heuristics in Weight-Sharing NAS [70.57382341642418]
We show that simple random search achieves competitive performance to complex state-of-the-art NAS algorithms when the super-net is properly trained.
We show that simple random search achieves competitive performance to complex state-of-the-art NAS algorithms when the super-net is properly trained.
arXiv Detail & Related papers (2021-10-04T02:18:44Z) - Efficient Transfer Learning via Joint Adaptation of Network Architecture
and Weight [66.8543732597723]
Recent worksin neural architecture search (NAS) can aid transfer learning by establishing sufficient network search space.
We propose a novel framework consisting of two modules, the neural architecturesearch module for architecture transfer and the neural weight search module for weight transfer.
These two modules conduct search on thetarget task based on a reduced super-networks, so we only need to trainonce on the source task.
arXiv Detail & Related papers (2021-05-19T08:58:04Z) - How Does Supernet Help in Neural Architecture Search? [3.8348281160758027]
We conduct a comprehensive analysis on five search spaces, including NAS-Bench-101, NAS-Bench-201, DARTS-CIFAR10, DARTS-PTB, and ProxylessNAS.
We find that weight sharing works well on some search spaces but fails on others.
Our work is expected to inspire future NAS researchers to better leverage the power of weight sharing.
arXiv Detail & Related papers (2020-10-16T08:07:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.