SiGeo: Sub-One-Shot NAS via Information Theory and Geometry of Loss
Landscape
- URL: http://arxiv.org/abs/2311.13169v1
- Date: Wed, 22 Nov 2023 05:25:24 GMT
- Title: SiGeo: Sub-One-Shot NAS via Information Theory and Geometry of Loss
Landscape
- Authors: Hua Zheng and Kuang-Hung Liu and Igor Fedorov and Xin Zhang and
Wen-Yen Chen and Wei Wen
- Abstract summary: We introduce a "sub-one-shot" paradigm that serves as a bridge between zero-shot and one-shot NAS.
In sub-one-shot NAS, the supernet is trained using only a small subset of the training data, a phase we refer to as "warm-up"
We present SiGeo, a proxy founded on a novel theoretical framework that connects the supernet warm-up with the efficacy of the proxy.
- Score: 14.550053893504764
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural Architecture Search (NAS) has become a widely used tool for automating
neural network design. While one-shot NAS methods have successfully reduced
computational requirements, they often require extensive training. On the other
hand, zero-shot NAS utilizes training-free proxies to evaluate a candidate
architecture's test performance but has two limitations: (1) inability to use
the information gained as a network improves with training and (2) unreliable
performance, particularly in complex domains like RecSys, due to the
multi-modal data inputs and complex architecture configurations. To synthesize
the benefits of both methods, we introduce a "sub-one-shot" paradigm that
serves as a bridge between zero-shot and one-shot NAS. In sub-one-shot NAS, the
supernet is trained using only a small subset of the training data, a phase we
refer to as "warm-up." Within this framework, we present SiGeo, a proxy founded
on a novel theoretical framework that connects the supernet warm-up with the
efficacy of the proxy. Extensive experiments have shown that SiGeo, with the
benefit of warm-up, consistently outperforms state-of-the-art NAS proxies on
various established NAS benchmarks. When a supernet is warmed up, it can
achieve comparable performance to weight-sharing one-shot NAS methods, but with
a significant reduction ($\sim 60$\%) in computational costs.
Related papers
- Zero-Shot Neural Architecture Search: Challenges, Solutions, and Opportunities [58.67514819895494]
Key idea behind zero-shot NAS approaches is to design proxies that can predict the accuracy of some given networks without training the network parameters.
This paper aims to comprehensively review and compare the state-of-the-art (SOTA) zero-shot NAS approaches.
arXiv Detail & Related papers (2023-07-05T03:07:00Z) - DiffusionNAG: Predictor-guided Neural Architecture Generation with Diffusion Models [56.584561770857306]
We propose a novel conditional Neural Architecture Generation (NAG) framework based on diffusion models, dubbed DiffusionNAG.
Specifically, we consider the neural architectures as directed graphs and propose a graph diffusion model for generating them.
We validate the effectiveness of DiffusionNAG through extensive experiments in two predictor-based NAS scenarios: Transferable NAS and Bayesian Optimization (BO)-based NAS.
When integrated into a BO-based algorithm, DiffusionNAG outperforms existing BO-based NAS approaches, particularly in the large MobileNetV3 search space on the ImageNet 1K dataset.
arXiv Detail & Related papers (2023-05-26T13:58:18Z) - Generalization Properties of NAS under Activation and Skip Connection
Search [66.8386847112332]
We study the generalization properties of Neural Architecture Search (NAS) under a unifying framework.
We derive the lower (and upper) bounds of the minimum eigenvalue of the Neural Tangent Kernel (NTK) under the (in)finite-width regime.
We show how the derived results can guide NAS to select the top-performing architectures, even in the case without training.
arXiv Detail & Related papers (2022-09-15T12:11:41Z) - Towards Self-supervised and Weight-preserving Neural Architecture Search [38.497608743382145]
We propose the self-supervised and weight-preserving neural architecture search (SSWP-NAS) as an extension of the current NAS framework.
Experiments show that the architectures searched by the proposed framework achieve state-of-the-art accuracy on CIFAR-10, CIFAR-100, and ImageNet datasets.
arXiv Detail & Related papers (2022-06-08T18:48:05Z) - UnrealNAS: Can We Search Neural Architectures with Unreal Data? [84.78460976605425]
Neural architecture search (NAS) has shown great success in the automatic design of deep neural networks (DNNs)
Previous work has analyzed the necessity of having ground-truth labels in NAS and inspired broad interest.
We take a further step to question whether real data is necessary for NAS to be effective.
arXiv Detail & Related papers (2022-05-04T16:30:26Z) - EPE-NAS: Efficient Performance Estimation Without Training for Neural
Architecture Search [1.1470070927586016]
We propose EPE-NAS, an efficient performance estimation strategy, that mitigates the problem of evaluating networks.
We show that EPE-NAS can produce a robust correlation and that by incorporating it into a simple random sampling strategy, we are able to search for competitive networks, without requiring any training, in a matter of seconds using a single GPU.
arXiv Detail & Related papers (2021-02-16T11:47:05Z) - AdvantageNAS: Efficient Neural Architecture Search with Credit
Assignment [23.988393741948485]
We propose a novel search strategy for one-shot and sparse propagation NAS, namely AdvantageNAS.
AdvantageNAS is a gradient-based approach that improves the search efficiency by introducing credit assignment in gradient estimation for architecture updates.
Experiments on the NAS-Bench-201 and PTB dataset show that AdvantageNAS discovers an architecture with higher performance under a limited time budget.
arXiv Detail & Related papers (2020-12-11T05:45:03Z) - Efficient Neural Architecture Search for End-to-end Speech Recognition
via Straight-Through Gradients [17.501966450686282]
We develop an efficient Neural Architecture Search (NAS) method via Straight-Through (ST) gradients, called ST-NAS.
Experiments over the widely benchmarked 80-hour WSJ and 300-hour Switchboard datasets show that ST-NAS induced architectures significantly outperform the human-designed architecture across the two datasets.
Strengths of ST-NAS such as architecture transferability and low computation cost in memory and time are also reported.
arXiv Detail & Related papers (2020-11-11T09:18:58Z) - Binarized Neural Architecture Search for Efficient Object Recognition [120.23378346337311]
Binarized neural architecture search (BNAS) produces extremely compressed models to reduce huge computational cost on embedded devices for edge computing.
An accuracy of $96.53%$ vs. $97.22%$ is achieved on the CIFAR-10 dataset, but with a significantly compressed model, and a $40%$ faster search than the state-of-the-art PC-DARTS.
arXiv Detail & Related papers (2020-09-08T15:51:23Z) - Few-shot Neural Architecture Search [35.28010196935195]
We propose few-shot NAS that uses multiple supernetworks, called sub-supernets, each covering different regions of the search space to alleviate the undesired co-adaption.
With only up to 7 sub-supernets, few-shot NAS establishes new SoTAs: on ImageNet, it finds models that reach 80.5% top-1 accuracy at 600 MB FLOPS and 77.5% top-1 accuracy at 238 MFLOPS.
arXiv Detail & Related papers (2020-06-11T22:36:01Z) - DSNAS: Direct Neural Architecture Search without Parameter Retraining [112.02966105995641]
We propose a new problem definition for NAS, task-specific end-to-end, based on this observation.
We propose DSNAS, an efficient differentiable NAS framework that simultaneously optimize architecture and parameters with a low-biased Monte Carlo estimate.
DSNAS successfully discovers networks with comparable accuracy (74.4%) on ImageNet in 420 GPU hours, reducing the total time by more than 34%.
arXiv Detail & Related papers (2020-02-21T04:41:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.