Related papers: BaLeNAS: Differentiable Architecture Search via the Bayesian Learning Rule

BaLeNAS: Differentiable Architecture Search via the Bayesian Learning Rule

URL: http://arxiv.org/abs/2111.13204v1
Date: Thu, 25 Nov 2021 18:13:42 GMT
Title: BaLeNAS: Differentiable Architecture Search via the Bayesian Learning Rule
Authors: Miao Zhang, Jilin Hu, Steven Su, Shirui Pan, Xiaojun Chang, Bin Yang, Gholamreza Haffari
Abstract summary: Differentiable Architecture Search (DARTS) has received massive attention in recent years, mainly because it significantly reduces the computational cost. This paper formulates the neural architecture search as a distribution learning problem through relaxing the architecture weights into Gaussian distributions. We demonstrate how the differentiable NAS benefits from Bayesian principles, enhancing exploration and improving stability.
Score: 95.56873042777316
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Differentiable Architecture Search (DARTS) has received massive attention in recent years, mainly because it significantly reduces the computational cost through weight sharing and continuous relaxation. However, more recent works find that existing differentiable NAS techniques struggle to outperform naive baselines, yielding deteriorative architectures as the search proceeds. Rather than directly optimizing the architecture parameters, this paper formulates the neural architecture search as a distribution learning problem through relaxing the architecture weights into Gaussian distributions. By leveraging the natural-gradient variational inference (NGVI), the architecture distribution can be easily optimized based on existing codebases without incurring more memory and computational consumption. We demonstrate how the differentiable NAS benefits from Bayesian principles, enhancing exploration and improving stability. The experimental results on NAS-Bench-201 and NAS-Bench-1shot1 benchmark datasets confirm the significant improvements the proposed framework can make. In addition, instead of simply applying the argmax on the learned parameters, we further leverage the recently-proposed training-free proxies in NAS to select the optimal architecture from a group architectures drawn from the optimized distribution, where we achieve state-of-the-art results on the NAS-Bench-201 and NAS-Bench-1shot1 benchmarks. Our best architecture in the DARTS search space also obtains competitive test errors with 2.37\%, 15.72\%, and 24.2\% on CIFAR-10, CIFAR-100, and ImageNet datasets, respectively.

Related papers

Architecture-Aware Minimization (A$^2$M): How to Find Flat Minima in Neural Architecture Search [3.724847012963521]
We investigate the geometric properties of neural architecture spaces commonly used in differentiable NAS methods. Building on these insights, we propose Architecture-Aware Minimization (A$2$M), a novel analytically derived algorithmic framework. A$2$M consistently improves generalization over state-of-the-art DARTS-based algorithms on benchmark datasets.
arXiv Detail & Related papers (2025-03-13T14:30:17Z)
Are Neural Architecture Search Benchmarks Well Designed? A Deeper Look Into Operation Importance [5.065947993017157]
We conduct an empirical analysis of the widely used NAS-Bench-101, NAS-Bench-201 and TransNAS-Bench-101 benchmarks. We found that only a subset of the operation pool is required to generate architectures close to the upper-bound of the performance range. We consistently found convolution layers to have the highest impact on the architecture's performance.
arXiv Detail & Related papers (2023-03-29T18:03:28Z)
Neural Architecture Ranker [19.21631623578852]
Architecture ranking has recently been advocated to design an efficient and effective performance predictor for Neural Architecture Search (NAS) Inspired by the stratification stratification, we propose a predictor, namely Neural Ranker (NAR)
arXiv Detail & Related papers (2022-01-30T04:54:59Z)
Rethinking Architecture Selection in Differentiable NAS [74.61723678821049]
Differentiable Neural Architecture Search is one of the most popular NAS methods for its search efficiency and simplicity. We propose an alternative perturbation-based architecture selection that directly measures each operation's influence on the supernet. We find that several failure modes of DARTS can be greatly alleviated with the proposed selection method.
arXiv Detail & Related papers (2021-08-10T00:53:39Z)
iDARTS: Differentiable Architecture Search with Stochastic Implicit Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS) We tackle the hypergradient computation in DARTS based on the implicit function theorem. We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z)
Weak NAS Predictors Are All You Need [91.11570424233709]
Recent predictor-based NAS approaches attempt to solve the problem with two key steps: sampling some architecture-performance pairs and fitting a proxy accuracy predictor. We shift the paradigm from finding a complicated predictor that covers the whole architecture space to a set of weaker predictors that progressively move towards the high-performance sub-space. Our method costs fewer samples to find the top-performance architectures on NAS-Bench-101 and NAS-Bench-201, and it achieves the state-of-the-art ImageNet performance on the NASNet search space.
arXiv Detail & Related papers (2021-02-21T01:58:43Z)
Smooth Variational Graph Embeddings for Efficient Neural Architecture Search [41.62970837629573]
We propose a two-sided variational graph autoencoder, which allows to smoothly encode and accurately reconstruct neural architectures from various search spaces. We evaluate the proposed approach on neural architectures defined by the ENAS approach, the NAS-Bench-101 and the NAS-Bench-201 search spaces.
arXiv Detail & Related papers (2020-10-09T17:05:41Z)
DrNAS: Dirichlet Neural Architecture Search [88.56953713817545]
We treat the continuously relaxed architecture mixing weight as random variables, modeled by Dirichlet distribution. With recently developed pathwise derivatives, the Dirichlet parameters can be easily optimized with gradient-based generalization. To alleviate the large memory consumption of differentiable NAS, we propose a simple yet effective progressive learning scheme.
arXiv Detail & Related papers (2020-06-18T08:23:02Z)
Geometry-Aware Gradient Algorithms for Neural Architecture Search [41.943045315986744]
We argue for the study of single-level empirical risk minimization to understand NAS with weight-sharing. We present a geometry-aware framework that exploits the underlying structure of this optimization to return sparse architectural parameters. We achieve state-of-the-art accuracy on the latest NAS benchmarks in computer vision.
arXiv Detail & Related papers (2020-04-16T17:46:39Z)
DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution Pruning [135.27931587381596]
We propose an efficient and unified NAS framework termed DDPNAS via dynamic distribution pruning. In particular, we first sample architectures from a joint categorical distribution. Then the search space is dynamically pruned and its distribution is updated every few epochs. With the proposed efficient network generation method, we directly obtain the optimal neural architectures on given constraints.
arXiv Detail & Related papers (2019-05-28T06:35:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.