Related papers: Connection Sensitivity Matters for Training-free DARTS: From Architecture-Level Scoring to Operation-Level Sensitivity Analysis

Connection Sensitivity Matters for Training-free DARTS: From Architecture-Level Scoring to Operation-Level Sensitivity Analysis

URL: http://arxiv.org/abs/2106.11542v4
Date: Fri, 12 May 2023 13:17:29 GMT
Title: Connection Sensitivity Matters for Training-free DARTS: From Architecture-Level Scoring to Operation-Level Sensitivity Analysis
Authors: Miao Zhang, Wei Huang, Li Wang
Abstract summary: Recently proposed training-free NAS methods abandon the training phase and design various zero-cost proxies as scores to identify excellent architectures. In this paper, we raise an interesting problem: can we properly measure the operation importance in DARTS through a training-free way, with avoiding the parameter-intensive bias? By devising an iterative and data-agnostic manner in utilizing ZEROS for NAS, our novel trial leads to a framework called training free differentiable architecture search (FreeDARTS)
Score: 32.94768616851585
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The recently proposed training-free NAS methods abandon the training phase and design various zero-cost proxies as scores to identify excellent architectures, arousing extreme computational efficiency for neural architecture search. In this paper, we raise an interesting problem: can we properly measure the operation importance in DARTS through a training-free way, with avoiding the parameter-intensive bias? We investigate this question through the lens of edge connectivity, and provide an affirmative answer by defining a connectivity concept, ZERo-cost Operation Sensitivity (ZEROS), to score the importance of candidate operations in DARTS at initialization. By devising an iterative and data-agnostic manner in utilizing ZEROS for NAS, our novel trial leads to a framework called training free differentiable architecture search (FreeDARTS). Based on the theory of Neural Tangent Kernel (NTK), we show the proposed connectivity score provably negatively correlated with the generalization bound of DARTS supernet after convergence under gradient descent training. In addition, we theoretically explain how ZEROS implicitly avoids parameter-intensive bias in selecting architectures, and empirically show the searched architectures by FreeDARTS are of comparable size. Extensive experiments have been conducted on a series of search spaces, and results have demonstrated that FreeDARTS is a reliable and efficient baseline for neural architecture search.

Related papers

GradAlign for Training-free Model Performance Inference [11.578933730530832]
Training-free neural architecture search (NAS) aims to discover the ideal architecture without requiring extensive training. We introduce GradAlign, a simple yet effective method designed for inferring model performance without the need for training.
arXiv Detail & Related papers (2024-11-29T16:27:55Z)
Robustifying DARTS by Eliminating Information Bypass Leakage via Explicit Sparse Regularization [8.93957397187611]
Differentiable architecture search (DARTS) is a promising end to end NAS method. Recent studies cast doubt on the basic underlying hypotheses of DARTS. We propose a novel sparse-regularized approximation and an efficient mixed-sparsity training scheme to robustify DARTS.
arXiv Detail & Related papers (2023-06-12T04:11:37Z)
Generalization Properties of NAS under Activation and Skip Connection Search [66.8386847112332]
We study the generalization properties of Neural Architecture Search (NAS) under a unifying framework. We derive the lower (and upper) bounds of the minimum eigenvalue of the Neural Tangent Kernel (NTK) under the (in)finite-width regime. We show how the derived results can guide NAS to select the top-performing architectures, even in the case without training.
arXiv Detail & Related papers (2022-09-15T12:11:41Z)
Demystifying the Neural Tangent Kernel from a Practical Perspective: Can it be trusted for Neural Architecture Search without training? [37.29036906991086]
In this work, we revisit several at-initialization metrics that can be derived from the Neural Tangent Kernel (NTK) We deduce that modern neural architectures exhibit highly non-linear characteristics, making the NTK-based metrics incapable of reliably estimating the performance of an architecture without some amount of training. We introduce Label-Gradient Alignment (LGA), a novel NTK-based metric whose inherent formulation allows it to capture the large amount of non-linear advantage present in modern neural architectures.
arXiv Detail & Related papers (2022-03-28T08:43:04Z)
KNAS: Green Neural Architecture Search [49.36732007176059]
We propose a new kernel based architecture search approach KNAS. Experiments show that KNAS achieves competitive results with orders of magnitude faster than "train-then-test" paradigms on image classification tasks. The searched network also outperforms strong baseline RoBERTA-large on two text classification tasks.
arXiv Detail & Related papers (2021-11-26T02:11:28Z)
ZARTS: On Zero-order Optimization for Neural Architecture Search [94.41017048659664]
Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency. This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation. In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue.
arXiv Detail & Related papers (2021-10-10T09:35:15Z)
D-DARTS: Distributed Differentiable Architecture Search [75.12821786565318]
Differentiable ARchiTecture Search (DARTS) is one of the most trending Neural Architecture Search (NAS) methods. We propose D-DARTS, a novel solution that addresses this problem by nesting several neural networks at cell-level.
arXiv Detail & Related papers (2021-08-20T09:07:01Z)
iDARTS: Differentiable Architecture Search with Stochastic Implicit Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS) We tackle the hypergradient computation in DARTS based on the implicit function theorem. We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z)
The Nonlinearity Coefficient -- A Practical Guide to Neural Architecture Design [3.04585143845864]
We develop methods that can predict, without any training, whether an architecture will achieve a relatively high test or training error on a task after training. We then go on to explain the error in terms of the architecture definition itself and develop tools for modifying the architecture. Our first major contribution is to show that the 'degree of nonlinearity' of a neural architecture is a key causal driver behind its performance.
arXiv Detail & Related papers (2021-05-25T20:47:43Z)
DrNAS: Dirichlet Neural Architecture Search [88.56953713817545]
We treat the continuously relaxed architecture mixing weight as random variables, modeled by Dirichlet distribution. With recently developed pathwise derivatives, the Dirichlet parameters can be easily optimized with gradient-based generalization. To alleviate the large memory consumption of differentiable NAS, we propose a simple yet effective progressive learning scheme.
arXiv Detail & Related papers (2020-06-18T08:23:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.