Generalizing Few-Shot NAS with Gradient Matching
- URL: http://arxiv.org/abs/2203.15207v1
- Date: Tue, 29 Mar 2022 03:06:16 GMT
- Title: Generalizing Few-Shot NAS with Gradient Matching
- Authors: Shoukang Hu, Ruochen Wang, Lanqing Hong, Zhenguo Li, Cho-Jui Hsieh,
Jiashi Feng
- Abstract summary: One-Shot methods train one supernet to approximate the performance of every architecture in the search space via weight-sharing.
Few-Shot NAS reduces the level of weight-sharing by splitting the One-Shot supernet into multiple separated sub-supernets.
It significantly outperforms its Few-Shot counterparts while surpassing previous comparable methods in terms of the accuracy of derived architectures.
- Score: 165.5690495295074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Efficient performance estimation of architectures drawn from large search
spaces is essential to Neural Architecture Search. One-Shot methods tackle this
challenge by training one supernet to approximate the performance of every
architecture in the search space via weight-sharing, thereby drastically
reducing the search cost. However, due to coupled optimization between child
architectures caused by weight-sharing, One-Shot supernet's performance
estimation could be inaccurate, leading to degraded search outcomes. To address
this issue, Few-Shot NAS reduces the level of weight-sharing by splitting the
One-Shot supernet into multiple separated sub-supernets via edge-wise
(layer-wise) exhaustive partitioning. Since each partition of the supernet is
not equally important, it necessitates the design of a more effective splitting
criterion. In this work, we propose a gradient matching score (GM) that
leverages gradient information at the shared weight for making informed
splitting decisions. Intuitively, gradients from different child models can be
used to identify whether they agree on how to update the shared modules, and
subsequently to decide if they should share the same weight. Compared with
exhaustive partitioning, the proposed criterion significantly reduces the
branching factor per edge. This allows us to split more edges (layers) for a
given budget, resulting in substantially improved performance as NAS search
spaces usually include dozens of edges (layers). Extensive empirical
evaluations of the proposed method on a wide range of search spaces
(NASBench-201, DARTS, MobileNet Space), datasets (cifar10, cifar100, ImageNet)
and search algorithms (DARTS, SNAS, RSPS, ProxylessNAS, OFA) demonstrate that
it significantly outperforms its Few-Shot counterparts while surpassing
previous comparable methods in terms of the accuracy of derived architectures.
Related papers
- The devil is in discretization discrepancy. Robustifying Differentiable NAS with Single-Stage Searching Protocol [2.4300749758571905]
gradient-based methods suffer from the discretization error, which can severely damage the process of obtaining the final architecture.
We introduce a novel single-stage searching protocol, which is not reliant on decoding a continuous architecture.
Our results demonstrate that this approach outperforms other DNAS methods by achieving 75.3% in the searching stage on the Cityscapes validation dataset.
arXiv Detail & Related papers (2024-05-26T15:44:53Z) - iDARTS: Differentiable Architecture Search with Stochastic Implicit
Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS)
We tackle the hypergradient computation in DARTS based on the implicit function theorem.
We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z) - Landmark Regularization: Ranking Guided Super-Net Training in Neural
Architecture Search [70.57382341642418]
Weight sharing has become a de facto standard in neural architecture search because it enables the search to be done on commodity hardware.
Recent works have empirically shown a ranking disorder between the performance of stand-alone architectures and that of the corresponding shared-weight networks.
We propose a regularization term that aims to maximize the correlation between the performance rankings of the shared-weight network and that of the standalone architectures.
arXiv Detail & Related papers (2021-04-12T09:32:33Z) - ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse
Coding [86.40042104698792]
We formulate neural architecture search as a sparse coding problem.
In experiments, our two-stage method on CIFAR-10 requires only 0.05 GPU-day for search.
Our one-stage method produces state-of-the-art performances on both CIFAR-10 and ImageNet at the cost of only evaluation time.
arXiv Detail & Related papers (2020-10-13T04:34:24Z) - MS-NAS: Multi-Scale Neural Architecture Search for Medical Image
Segmentation [16.206524842952636]
This paper presents a Multi-Scale NAS framework that is featured with multi-scale search space from network backbone to cell operation.
On various datasets for segmentation, MS-NAS outperforms the state-of-the-art methods and achieves 0.6-5.4% mIOU and 0.4-3.5% DSC improvements.
arXiv Detail & Related papers (2020-07-13T02:02:00Z) - DC-NAS: Divide-and-Conquer Neural Architecture Search [108.57785531758076]
We present a divide-and-conquer (DC) approach to effectively and efficiently search deep neural architectures.
We achieve a $75.1%$ top-1 accuracy on the ImageNet dataset, which is higher than that of state-of-the-art methods using the same search space.
arXiv Detail & Related papers (2020-05-29T09:02:16Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.