Related papers: SiGeo: Sub-One-Shot NAS via Information Theory and Geometry of Loss Landscape

SiGeo: Sub-One-Shot NAS via Information Theory and Geometry of Loss Landscape

URL: http://arxiv.org/abs/2311.13169v1
Date: Wed, 22 Nov 2023 05:25:24 GMT
Title: SiGeo: Sub-One-Shot NAS via Information Theory and Geometry of Loss Landscape
Authors: Hua Zheng and Kuang-Hung Liu and Igor Fedorov and Xin Zhang and Wen-Yen Chen and Wei Wen
Abstract summary: We introduce a "sub-one-shot" paradigm that serves as a bridge between zero-shot and one-shot NAS. In sub-one-shot NAS, the supernet is trained using only a small subset of the training data, a phase we refer to as "warm-up" We present SiGeo, a proxy founded on a novel theoretical framework that connects the supernet warm-up with the efficacy of the proxy.
Score: 14.550053893504764
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neural Architecture Search (NAS) has become a widely used tool for automating neural network design. While one-shot NAS methods have successfully reduced computational requirements, they often require extensive training. On the other hand, zero-shot NAS utilizes training-free proxies to evaluate a candidate architecture's test performance but has two limitations: (1) inability to use the information gained as a network improves with training and (2) unreliable performance, particularly in complex domains like RecSys, due to the multi-modal data inputs and complex architecture configurations. To synthesize the benefits of both methods, we introduce a "sub-one-shot" paradigm that serves as a bridge between zero-shot and one-shot NAS. In sub-one-shot NAS, the supernet is trained using only a small subset of the training data, a phase we refer to as "warm-up." Within this framework, we present SiGeo, a proxy founded on a novel theoretical framework that connects the supernet warm-up with the efficacy of the proxy. Extensive experiments have shown that SiGeo, with the benefit of warm-up, consistently outperforms state-of-the-art NAS proxies on various established NAS benchmarks. When a supernet is warmed up, it can achieve comparable performance to weight-sharing one-shot NAS methods, but with a significant reduction ($\sim 60$\%) in computational costs.

Related papers

RBFleX-NAS: Training-Free Neural Architecture Search Using Radial Basis Function Kernel and Hyperparameter Detection [4.559021500490186]
We present RBFleX-NAS, a novel training-free NAS framework that accounts for both activation outputs and input features of the last layer. RBFleX-NAS significantly outperforms state-of-the-art training-free NAS methods in terms of top-1 accuracy. We also propose NAFBee, a new activation design space that extends the activation type to encompass various commonly used functions.
arXiv Detail & Related papers (2025-03-26T13:15:21Z)
Delta-NAS: Difference of Architecture Encoding for Predictor-based Evolutionary Neural Architecture Search [5.1331676121360985]
We craft an algorithm with the capability to perform fine-grain NAS at a low cost. We propose projecting the problem to a lower dimensional space through predicting the difference in accuracy of a pair of similar networks.
arXiv Detail & Related papers (2024-11-21T02:43:32Z)
Zero-Shot Neural Architecture Search: Challenges, Solutions, and Opportunities [58.67514819895494]
Key idea behind zero-shot NAS approaches is to design proxies that can predict the accuracy of some given networks without training the network parameters. This paper aims to comprehensively review and compare the state-of-the-art (SOTA) zero-shot NAS approaches.
arXiv Detail & Related papers (2023-07-05T03:07:00Z)
DiffusionNAG: Predictor-guided Neural Architecture Generation with Diffusion Models [56.584561770857306]
We propose a novel conditional Neural Architecture Generation (NAG) framework based on diffusion models, dubbed DiffusionNAG. Specifically, we consider the neural architectures as directed graphs and propose a graph diffusion model for generating them. We validate the effectiveness of DiffusionNAG through extensive experiments in two predictor-based NAS scenarios: Transferable NAS and Bayesian Optimization (BO)-based NAS. When integrated into a BO-based algorithm, DiffusionNAG outperforms existing BO-based NAS approaches, particularly in the large MobileNetV3 search space on the ImageNet 1K dataset.
arXiv Detail & Related papers (2023-05-26T13:58:18Z)
Generalization Properties of NAS under Activation and Skip Connection Search [66.8386847112332]
We study the generalization properties of Neural Architecture Search (NAS) under a unifying framework. We derive the lower (and upper) bounds of the minimum eigenvalue of the Neural Tangent Kernel (NTK) under the (in)finite-width regime. We show how the derived results can guide NAS to select the top-performing architectures, even in the case without training.
arXiv Detail & Related papers (2022-09-15T12:11:41Z)
Towards Self-supervised and Weight-preserving Neural Architecture Search [38.497608743382145]
We propose the self-supervised and weight-preserving neural architecture search (SSWP-NAS) as an extension of the current NAS framework. Experiments show that the architectures searched by the proposed framework achieve state-of-the-art accuracy on CIFAR-10, CIFAR-100, and ImageNet datasets.
arXiv Detail & Related papers (2022-06-08T18:48:05Z)
UnrealNAS: Can We Search Neural Architectures with Unreal Data? [84.78460976605425]
Neural architecture search (NAS) has shown great success in the automatic design of deep neural networks (DNNs) Previous work has analyzed the necessity of having ground-truth labels in NAS and inspired broad interest. We take a further step to question whether real data is necessary for NAS to be effective.
arXiv Detail & Related papers (2022-05-04T16:30:26Z)
EPE-NAS: Efficient Performance Estimation Without Training for Neural Architecture Search [1.1470070927586016]
We propose EPE-NAS, an efficient performance estimation strategy, that mitigates the problem of evaluating networks. We show that EPE-NAS can produce a robust correlation and that by incorporating it into a simple random sampling strategy, we are able to search for competitive networks, without requiring any training, in a matter of seconds using a single GPU.
arXiv Detail & Related papers (2021-02-16T11:47:05Z)
AdvantageNAS: Efficient Neural Architecture Search with Credit Assignment [23.988393741948485]
We propose a novel search strategy for one-shot and sparse propagation NAS, namely AdvantageNAS. AdvantageNAS is a gradient-based approach that improves the search efficiency by introducing credit assignment in gradient estimation for architecture updates. Experiments on the NAS-Bench-201 and PTB dataset show that AdvantageNAS discovers an architecture with higher performance under a limited time budget.
arXiv Detail & Related papers (2020-12-11T05:45:03Z)
Binarized Neural Architecture Search for Efficient Object Recognition [120.23378346337311]
Binarized neural architecture search (BNAS) produces extremely compressed models to reduce huge computational cost on embedded devices for edge computing. An accuracy of $96.53%$ vs. $97.22%$ is achieved on the CIFAR-10 dataset, but with a significantly compressed model, and a $40%$ faster search than the state-of-the-art PC-DARTS.
arXiv Detail & Related papers (2020-09-08T15:51:23Z)
Few-shot Neural Architecture Search [35.28010196935195]
We propose few-shot NAS that uses multiple supernetworks, called sub-supernets, each covering different regions of the search space to alleviate the undesired co-adaption. With only up to 7 sub-supernets, few-shot NAS establishes new SoTAs: on ImageNet, it finds models that reach 80.5% top-1 accuracy at 600 MB FLOPS and 77.5% top-1 accuracy at 238 MFLOPS.
arXiv Detail & Related papers (2020-06-11T22:36:01Z)
DSNAS: Direct Neural Architecture Search without Parameter Retraining [112.02966105995641]
We propose a new problem definition for NAS, task-specific end-to-end, based on this observation. We propose DSNAS, an efficient differentiable NAS framework that simultaneously optimize architecture and parameters with a low-biased Monte Carlo estimate. DSNAS successfully discovers networks with comparable accuracy (74.4%) on ImageNet in 420 GPU hours, reducing the total time by more than 34%.
arXiv Detail & Related papers (2020-02-21T04:41:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.