Related papers: iDARTS: Differentiable Architecture Search with Stochastic Implicit Gradients

iDARTS: Differentiable Architecture Search with Stochastic Implicit Gradients

URL: http://arxiv.org/abs/2106.10784v1
Date: Mon, 21 Jun 2021 00:44:11 GMT
Title: iDARTS: Differentiable Architecture Search with Stochastic Implicit Gradients
Authors: Miao Zhang, Steven Su, Shirui Pan, Xiaojun Chang, Ehsan Abbasnejad, Reza Haffari
Abstract summary: Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS) We tackle the hypergradient computation in DARTS based on the implicit function theorem. We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
Score: 75.41173109807735
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: \textit{Differentiable ARchiTecture Search} (DARTS) has recently become the mainstream of neural architecture search (NAS) due to its efficiency and simplicity. With a gradient-based bi-level optimization, DARTS alternately optimizes the inner model weights and the outer architecture parameter in a weight-sharing supernet. A key challenge to the scalability and quality of the learned architectures is the need for differentiating through the inner-loop optimisation. While much has been discussed about several potentially fatal factors in DARTS, the architecture gradient, a.k.a. hypergradient, has received less attention. In this paper, we tackle the hypergradient computation in DARTS based on the implicit function theorem, making it only depends on the obtained solution to the inner-loop optimization and agnostic to the optimization path. To further reduce the computational requirements, we formulate a stochastic hypergradient approximation for differentiable NAS, and theoretically show that the architecture optimization with the proposed method, named iDARTS, is expected to converge to a stationary point. Comprehensive experiments on two NAS benchmark search spaces and the common NAS search space verify the effectiveness of our proposed method. It leads to architectures outperforming, with large margins, those learned by the baseline methods.

Related papers

OStr-DARTS: Differentiable Neural Architecture Search based on Operation Strength [70.76342136866413]
Differentiable architecture search (DARTS) has emerged as a promising technique for effective neural architecture search. DARTS suffers from the well-known degeneration issue which can lead to deteriorating architectures. We propose a novel criterion based on operation strength that estimates the importance of an operation by its effect on the final loss.
arXiv Detail & Related papers (2024-09-22T13:16:07Z)
$\Lambda$-DARTS: Mitigating Performance Collapse by Harmonizing Operation Selection among Cells [11.777101481512423]
Differentiable neural architecture search (DARTS) is a popular method for neural architecture search (NAS) We show that DARTS suffers from a specific structural flaw due to its weight-sharing framework that limits the convergence of DARTS to saturation points of the softmax function. We propose two new regularization terms that aim to prevent performance collapse by harmonizing operation selection via aligning gradients of layers.
arXiv Detail & Related papers (2022-10-14T17:54:01Z)
Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search [96.20505710087392]
We propose a Shapley value based method to evaluate operation contribution (Shapley-NAS) for neural architecture search. We show that our method outperforms the state-of-the-art methods by a considerable margin with light search cost.
arXiv Detail & Related papers (2022-06-20T14:41:49Z)
ZARTS: On Zero-order Optimization for Neural Architecture Search [94.41017048659664]
Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency. This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation. In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue.
arXiv Detail & Related papers (2021-10-10T09:35:15Z)
Rethinking Architecture Selection in Differentiable NAS [74.61723678821049]
Differentiable Neural Architecture Search is one of the most popular NAS methods for its search efficiency and simplicity. We propose an alternative perturbation-based architecture selection that directly measures each operation's influence on the supernet. We find that several failure modes of DARTS can be greatly alleviated with the proposed selection method.
arXiv Detail & Related papers (2021-08-10T00:53:39Z)
ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding [86.40042104698792]
We formulate neural architecture search as a sparse coding problem. In experiments, our two-stage method on CIFAR-10 requires only 0.05 GPU-day for search. Our one-stage method produces state-of-the-art performances on both CIFAR-10 and ImageNet at the cost of only evaluation time.
arXiv Detail & Related papers (2020-10-13T04:34:24Z)
Smooth Variational Graph Embeddings for Efficient Neural Architecture Search [41.62970837629573]
We propose a two-sided variational graph autoencoder, which allows to smoothly encode and accurately reconstruct neural architectures from various search spaces. We evaluate the proposed approach on neural architectures defined by the ENAS approach, the NAS-Bench-101 and the NAS-Bench-201 search spaces.
arXiv Detail & Related papers (2020-10-09T17:05:41Z)
DrNAS: Dirichlet Neural Architecture Search [88.56953713817545]
We treat the continuously relaxed architecture mixing weight as random variables, modeled by Dirichlet distribution. With recently developed pathwise derivatives, the Dirichlet parameters can be easily optimized with gradient-based generalization. To alleviate the large memory consumption of differentiable NAS, we propose a simple yet effective progressive learning scheme.
arXiv Detail & Related papers (2020-06-18T08:23:02Z)
Geometry-Aware Gradient Algorithms for Neural Architecture Search [41.943045315986744]
We argue for the study of single-level empirical risk minimization to understand NAS with weight-sharing. We present a geometry-aware framework that exploits the underlying structure of this optimization to return sparse architectural parameters. We achieve state-of-the-art accuracy on the latest NAS benchmarks in computer vision.
arXiv Detail & Related papers (2020-04-16T17:46:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.