KNAS: Green Neural Architecture Search
- URL: http://arxiv.org/abs/2111.13293v1
- Date: Fri, 26 Nov 2021 02:11:28 GMT
- Title: KNAS: Green Neural Architecture Search
- Authors: Jingjing Xu, Liang Zhao, Junyang Lin, Rundong Gao, Xu Sun, Hongxia
Yang
- Abstract summary: We propose a new kernel based architecture search approach KNAS.
Experiments show that KNAS achieves competitive results with orders of magnitude faster than "train-then-test" paradigms on image classification tasks.
The searched network also outperforms strong baseline RoBERTA-large on two text classification tasks.
- Score: 49.36732007176059
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many existing neural architecture search (NAS) solutions rely on downstream
training for architecture evaluation, which takes enormous computations.
Considering that these computations bring a large carbon footprint, this paper
aims to explore a green (namely environmental-friendly) NAS solution that
evaluates architectures without training. Intuitively, gradients, induced by
the architecture itself, directly decide the convergence and generalization
results. It motivates us to propose the gradient kernel hypothesis: Gradients
can be used as a coarse-grained proxy of downstream training to evaluate
random-initialized networks. To support the hypothesis, we conduct a
theoretical analysis and find a practical gradient kernel that has good
correlations with training loss and validation performance. According to this
hypothesis, we propose a new kernel based architecture search approach KNAS.
Experiments show that KNAS achieves competitive results with orders of
magnitude faster than "train-then-test" paradigms on image classification
tasks. Furthermore, the extremely low search cost enables its wide
applications. The searched network also outperforms strong baseline
RoBERTA-large on two text classification tasks. Codes are available at
\url{https://github.com/Jingjing-NLP/KNAS} .
Related papers
- GeNAS: Neural Architecture Search with Better Generalization [14.92869716323226]
Recent neural architecture search (NAS) approaches rely on validation loss or accuracy to find the superior network for the target data.
In this paper, we investigate a new neural architecture search measure for excavating architectures with better generalization.
arXiv Detail & Related papers (2023-05-15T12:44:54Z) - FlowNAS: Neural Architecture Search for Optical Flow Estimation [65.44079917247369]
We propose a neural architecture search method named FlowNAS to automatically find the better encoder architecture for flow estimation task.
Experimental results show that the discovered architecture with the weights inherited from the super-network achieves 4.67% F1-all error on KITTI.
arXiv Detail & Related papers (2022-07-04T09:05:25Z) - Demystifying the Neural Tangent Kernel from a Practical Perspective: Can
it be trusted for Neural Architecture Search without training? [37.29036906991086]
In this work, we revisit several at-initialization metrics that can be derived from the Neural Tangent Kernel (NTK)
We deduce that modern neural architectures exhibit highly non-linear characteristics, making the NTK-based metrics incapable of reliably estimating the performance of an architecture without some amount of training.
We introduce Label-Gradient Alignment (LGA), a novel NTK-based metric whose inherent formulation allows it to capture the large amount of non-linear advantage present in modern neural architectures.
arXiv Detail & Related papers (2022-03-28T08:43:04Z) - L$^{2}$NAS: Learning to Optimize Neural Architectures via
Continuous-Action Reinforcement Learning [23.25155249879658]
Differentiable architecture search (NAS) achieved remarkable results in deep neural network design.
We show that L$2$ achieves state-of-theart results on DART201 benchmark as well as NASS and Once-for-All search policies.
arXiv Detail & Related papers (2021-09-25T19:26:30Z) - Connection Sensitivity Matters for Training-free DARTS: From
Architecture-Level Scoring to Operation-Level Sensitivity Analysis [32.94768616851585]
Recently proposed training-free NAS methods abandon the training phase and design various zero-cost proxies as scores to identify excellent architectures.
In this paper, we raise an interesting problem: can we properly measure the operation importance in DARTS through a training-free way, with avoiding the parameter-intensive bias?
By devising an iterative and data-agnostic manner in utilizing ZEROS for NAS, our novel trial leads to a framework called training free differentiable architecture search (FreeDARTS)
arXiv Detail & Related papers (2021-06-22T04:40:34Z) - iDARTS: Differentiable Architecture Search with Stochastic Implicit
Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS)
We tackle the hypergradient computation in DARTS based on the implicit function theorem.
We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z) - Neural Architecture Search on ImageNet in Four GPU Hours: A
Theoretically Inspired Perspective [88.39981851247727]
We propose a novel framework called training-free neural architecture search (TE-NAS)
TE-NAS ranks architectures by analyzing the spectrum of the neural tangent kernel (NTK) and the number of linear regions in the input space.
We show that: (1) these two measurements imply the trainability and expressivity of a neural network; (2) they strongly correlate with the network's test accuracy.
arXiv Detail & Related papers (2021-02-23T07:50:44Z) - Multi-objective Neural Architecture Search with Almost No Training [9.93048700248444]
We propose an effective alternative, dubbed Random-Weight Evaluation (RWE), to rapidly estimate the performance of network architectures.
RWE reduces the computational cost of evaluating an architecture from hours to seconds.
When integrated within an evolutionary multi-objective algorithm, RWE obtains a set of efficient architectures with state-of-the-art performance on CIFAR-10 with less than two hours' searching on a single GPU card.
arXiv Detail & Related papers (2020-11-27T07:39:17Z) - DrNAS: Dirichlet Neural Architecture Search [88.56953713817545]
We treat the continuously relaxed architecture mixing weight as random variables, modeled by Dirichlet distribution.
With recently developed pathwise derivatives, the Dirichlet parameters can be easily optimized with gradient-based generalization.
To alleviate the large memory consumption of differentiable NAS, we propose a simple yet effective progressive learning scheme.
arXiv Detail & Related papers (2020-06-18T08:23:02Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.