ZARTS: On Zero-order Optimization for Neural Architecture Search
- URL: http://arxiv.org/abs/2110.04743v1
- Date: Sun, 10 Oct 2021 09:35:15 GMT
- Title: ZARTS: On Zero-order Optimization for Neural Architecture Search
- Authors: Xiaoxing Wang, Wenxuan Guo, Junchi Yan, Jianlin Su, Xiaokang Yang
- Abstract summary: Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency.
This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation.
In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue.
- Score: 94.41017048659664
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Differentiable architecture search (DARTS) has been a popular one-shot
paradigm for NAS due to its high efficiency. It introduces trainable
architecture parameters to represent the importance of candidate operations and
proposes first/second-order approximation to estimate their gradients, making
it possible to solve NAS by gradient descent algorithm. However, our in-depth
empirical results show that the approximation will often distort the loss
landscape, leading to the biased objective to optimize and in turn inaccurate
gradient estimation for architecture parameters. This work turns to zero-order
optimization and proposes a novel NAS scheme, called ZARTS, to search without
enforcing the above approximation. Specifically, three representative
zero-order optimization methods are introduced: RS, MGS, and GLD, among which
MGS performs best by balancing the accuracy and speed. Moreover, we explore the
connections between RS/MGS and gradient descent algorithm and show that our
ZARTS can be seen as a robust gradient-free counterpart to DARTS. Extensive
experiments on multiple datasets and search spaces show the remarkable
performance of our method. In particular, results on 12 benchmarks verify the
outstanding robustness of ZARTS, where the performance of DARTS collapses due
to its known instability issue. Also, we search on the search space of DARTS to
compare with peer methods, and our discovered architecture achieves 97.54%
accuracy on CIFAR-10 and 75.7% top-1 accuracy on ImageNet, which are
state-of-the-art performance.
Related papers
- Operation-level Progressive Differentiable Architecture Search [19.214462477848535]
We propose operation-level progressive differentiable neural architecture search (OPP-DARTS) to avoid skip connections aggregation.
Our method's performance on CIFAR-10 is superior to the architecture found by standard DARTS.
arXiv Detail & Related papers (2023-02-11T09:18:01Z) - Shapley-NAS: Discovering Operation Contribution for Neural Architecture
Search [96.20505710087392]
We propose a Shapley value based method to evaluate operation contribution (Shapley-NAS) for neural architecture search.
We show that our method outperforms the state-of-the-art methods by a considerable margin with light search cost.
arXiv Detail & Related papers (2022-06-20T14:41:49Z) - DAAS: Differentiable Architecture and Augmentation Policy Search [107.53318939844422]
This work considers the possible coupling between neural architectures and data augmentation and proposes an effective algorithm jointly searching for them.
Our approach achieves 97.91% accuracy on CIFAR-10 and 76.6% Top-1 accuracy on ImageNet dataset, showing the outstanding performance of our search algorithm.
arXiv Detail & Related papers (2021-09-30T17:15:17Z) - iDARTS: Differentiable Architecture Search with Stochastic Implicit
Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS)
We tackle the hypergradient computation in DARTS based on the implicit function theorem.
We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z) - Single-level Optimization For Differential Architecture Search [6.3531384587183135]
differential architecture search (DARTS) makes gradient of architecture parameters biased for network weights.
We propose to use single-level to replace bi-level optimization and non-competitive activation function like sigmoid to replace softmax.
Experiments on NAS Benchmark 201 validate our hypothesis and stably find out nearly the optimal architecture.
arXiv Detail & Related papers (2020-12-15T18:40:33Z) - ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse
Coding [86.40042104698792]
We formulate neural architecture search as a sparse coding problem.
In experiments, our two-stage method on CIFAR-10 requires only 0.05 GPU-day for search.
Our one-stage method produces state-of-the-art performances on both CIFAR-10 and ImageNet at the cost of only evaluation time.
arXiv Detail & Related papers (2020-10-13T04:34:24Z) - DrNAS: Dirichlet Neural Architecture Search [88.56953713817545]
We treat the continuously relaxed architecture mixing weight as random variables, modeled by Dirichlet distribution.
With recently developed pathwise derivatives, the Dirichlet parameters can be easily optimized with gradient-based generalization.
To alleviate the large memory consumption of differentiable NAS, we propose a simple yet effective progressive learning scheme.
arXiv Detail & Related papers (2020-06-18T08:23:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.