Connection Sensitivity Matters for Training-free DARTS: From
Architecture-Level Scoring to Operation-Level Sensitivity Analysis
- URL: http://arxiv.org/abs/2106.11542v4
- Date: Fri, 12 May 2023 13:17:29 GMT
- Title: Connection Sensitivity Matters for Training-free DARTS: From
Architecture-Level Scoring to Operation-Level Sensitivity Analysis
- Authors: Miao Zhang, Wei Huang, Li Wang
- Abstract summary: Recently proposed training-free NAS methods abandon the training phase and design various zero-cost proxies as scores to identify excellent architectures.
In this paper, we raise an interesting problem: can we properly measure the operation importance in DARTS through a training-free way, with avoiding the parameter-intensive bias?
By devising an iterative and data-agnostic manner in utilizing ZEROS for NAS, our novel trial leads to a framework called training free differentiable architecture search (FreeDARTS)
- Score: 32.94768616851585
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recently proposed training-free NAS methods abandon the training phase
and design various zero-cost proxies as scores to identify excellent
architectures, arousing extreme computational efficiency for neural
architecture search. In this paper, we raise an interesting problem: can we
properly measure the operation importance in DARTS through a training-free way,
with avoiding the parameter-intensive bias? We investigate this question
through the lens of edge connectivity, and provide an affirmative answer by
defining a connectivity concept, ZERo-cost Operation Sensitivity (ZEROS), to
score the importance of candidate operations in DARTS at initialization. By
devising an iterative and data-agnostic manner in utilizing ZEROS for NAS, our
novel trial leads to a framework called training free differentiable
architecture search (FreeDARTS). Based on the theory of Neural Tangent Kernel
(NTK), we show the proposed connectivity score provably negatively correlated
with the generalization bound of DARTS supernet after convergence under
gradient descent training. In addition, we theoretically explain how ZEROS
implicitly avoids parameter-intensive bias in selecting architectures, and
empirically show the searched architectures by FreeDARTS are of comparable
size. Extensive experiments have been conducted on a series of search spaces,
and results have demonstrated that FreeDARTS is a reliable and efficient
baseline for neural architecture search.
Related papers
- Robustifying DARTS by Eliminating Information Bypass Leakage via
Explicit Sparse Regularization [8.93957397187611]
Differentiable architecture search (DARTS) is a promising end to end NAS method.
Recent studies cast doubt on the basic underlying hypotheses of DARTS.
We propose a novel sparse-regularized approximation and an efficient mixed-sparsity training scheme to robustify DARTS.
arXiv Detail & Related papers (2023-06-12T04:11:37Z) - Generalization Properties of NAS under Activation and Skip Connection
Search [66.8386847112332]
We study the generalization properties of Neural Architecture Search (NAS) under a unifying framework.
We derive the lower (and upper) bounds of the minimum eigenvalue of the Neural Tangent Kernel (NTK) under the (in)finite-width regime.
We show how the derived results can guide NAS to select the top-performing architectures, even in the case without training.
arXiv Detail & Related papers (2022-09-15T12:11:41Z) - Demystifying the Neural Tangent Kernel from a Practical Perspective: Can
it be trusted for Neural Architecture Search without training? [37.29036906991086]
In this work, we revisit several at-initialization metrics that can be derived from the Neural Tangent Kernel (NTK)
We deduce that modern neural architectures exhibit highly non-linear characteristics, making the NTK-based metrics incapable of reliably estimating the performance of an architecture without some amount of training.
We introduce Label-Gradient Alignment (LGA), a novel NTK-based metric whose inherent formulation allows it to capture the large amount of non-linear advantage present in modern neural architectures.
arXiv Detail & Related papers (2022-03-28T08:43:04Z) - KNAS: Green Neural Architecture Search [49.36732007176059]
We propose a new kernel based architecture search approach KNAS.
Experiments show that KNAS achieves competitive results with orders of magnitude faster than "train-then-test" paradigms on image classification tasks.
The searched network also outperforms strong baseline RoBERTA-large on two text classification tasks.
arXiv Detail & Related papers (2021-11-26T02:11:28Z) - ZARTS: On Zero-order Optimization for Neural Architecture Search [94.41017048659664]
Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency.
This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation.
In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue.
arXiv Detail & Related papers (2021-10-10T09:35:15Z) - D-DARTS: Distributed Differentiable Architecture Search [75.12821786565318]
Differentiable ARchiTecture Search (DARTS) is one of the most trending Neural Architecture Search (NAS) methods.
We propose D-DARTS, a novel solution that addresses this problem by nesting several neural networks at cell-level.
arXiv Detail & Related papers (2021-08-20T09:07:01Z) - iDARTS: Differentiable Architecture Search with Stochastic Implicit
Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS)
We tackle the hypergradient computation in DARTS based on the implicit function theorem.
We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z) - The Nonlinearity Coefficient -- A Practical Guide to Neural Architecture
Design [3.04585143845864]
We develop methods that can predict, without any training, whether an architecture will achieve a relatively high test or training error on a task after training.
We then go on to explain the error in terms of the architecture definition itself and develop tools for modifying the architecture.
Our first major contribution is to show that the 'degree of nonlinearity' of a neural architecture is a key causal driver behind its performance.
arXiv Detail & Related papers (2021-05-25T20:47:43Z) - DrNAS: Dirichlet Neural Architecture Search [88.56953713817545]
We treat the continuously relaxed architecture mixing weight as random variables, modeled by Dirichlet distribution.
With recently developed pathwise derivatives, the Dirichlet parameters can be easily optimized with gradient-based generalization.
To alleviate the large memory consumption of differentiable NAS, we propose a simple yet effective progressive learning scheme.
arXiv Detail & Related papers (2020-06-18T08:23:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.