Related papers: TG-NAS: Leveraging Zero-Cost Proxies with Transformer and Graph Convolution Networks for Efficient Neural Architecture Search

TG-NAS: Leveraging Zero-Cost Proxies with Transformer and Graph Convolution Networks for Efficient Neural Architecture Search

URL: http://arxiv.org/abs/2404.00271v1
Date: Sat, 30 Mar 2024 07:25:30 GMT
Title: TG-NAS: Leveraging Zero-Cost Proxies with Transformer and Graph Convolution Networks for Efficient Neural Architecture Search
Authors: Ye Qiao, Haocheng Xu, Sitao Huang,
Abstract summary: TG-NAS aims to create training-free proxies for architecture performance prediction. We introduce TG-NAS, a novel model-based universal proxy that leverages a transformer-based operator embedding generator and a graph convolution network (GCN) to predict architecture performance. TG-NAS achieves up to 300X improvements in search efficiency compared to previous SOTA ZC proxy methods.
Score: 1.30891455653235
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural architecture search (NAS) is an effective method for discovering new convolutional neural network (CNN) architectures. However, existing approaches often require time-consuming training or intensive sampling and evaluations. Zero-shot NAS aims to create training-free proxies for architecture performance prediction. However, existing proxies have suboptimal performance, and are often outperformed by simple metrics such as model parameter counts or the number of floating-point operations. Besides, existing model-based proxies cannot be generalized to new search spaces with unseen new types of operators without golden accuracy truth. A universally optimal proxy remains elusive. We introduce TG-NAS, a novel model-based universal proxy that leverages a transformer-based operator embedding generator and a graph convolution network (GCN) to predict architecture performance. This approach guides neural architecture search across any given search space without the need of retraining. Distinct from other model-based predictor subroutines, TG-NAS itself acts as a zero-cost (ZC) proxy, guiding architecture search with advantages in terms of data independence, cost-effectiveness, and consistency across diverse search spaces. Our experiments showcase its advantages over existing proxies across various NAS benchmarks, suggesting its potential as a foundational element for efficient architecture search. TG-NAS achieves up to 300X improvements in search efficiency compared to previous SOTA ZC proxy methods. Notably, it discovers competitive models with 93.75% CIFAR-10 accuracy on the NAS-Bench-201 space and 74.5% ImageNet top-1 accuracy on the DARTS space.

Related papers

Zero-Shot Neural Architecture Search with Weighted Response Correlation [5.953576590699768]
We present a novel training-free estimation proxy called weighted response correlation (WRCor)<n>WRCor utilizes coefficient correlation matrices of responses across different input samples to calculate the proxy scores of estimated architectures.<n>Our NAS algorithm can discover an architecture with a 22.1% test error on the ImageNet-1k dataset within 4 GPU hours.
arXiv Detail & Related papers (2025-07-08T02:19:29Z)
L-SWAG: Layer-Sample Wise Activation with Gradients information for Zero-Shot NAS on Vision Transformers [39.19675815138566]
Training-free Neural Architecture Search (NAS) efficiently identifies high-performing neural networks using zero-cost (ZC) proxies.<n>ZC-NAS is both (i) time-efficient, eliminating the need for model training, and (ii) interpretable, with proxy designs often theoretically grounded.<n>This work extends ZC proxy applicability to Vision Transformers (ViTs)
arXiv Detail & Related papers (2025-05-12T07:44:52Z)
ZeroLM: Data-Free Transformer Architecture Search for Language Models [54.83882149157548]
Current automated proxy discovery approaches suffer from extended search times, susceptibility to data overfitting, and structural complexity. This paper introduces a novel zero-cost proxy methodology that quantifies model capacity through efficient weight statistics. Our evaluation demonstrates the superiority of this approach, achieving a Spearman's rho of 0.76 and Kendall's tau of 0.53 on the FlexiBERT benchmark.
arXiv Detail & Related papers (2025-03-24T13:11:22Z)
Zero-Shot NAS via the Suppression of Local Entropy Decrease [21.100745856699277]
Architecture performance evaluation is the most time-consuming part of neural architecture search (NAS) Zero-Shot NAS accelerates the evaluation by utilizing zero-cost proxies instead of training. architectural topologies are used to evaluate the performance of networks in this study.
arXiv Detail & Related papers (2024-11-09T17:36:53Z)
AZ-NAS: Assembling Zero-Cost Proxies for Network Architecture Search [30.64117903216323]
Training-free network architecture search (NAS) aims to discover high-performing networks with zero-cost proxies. We propose AZ-NAS, a novel approach that leverages the ensemble of various zero-cost proxies to enhance the correlation between a predicted ranking of networks and the ground truth. Results conclusively demonstrate the efficacy and efficiency of AZ-NAS, outperforming state-of-the-art methods on standard benchmarks.
arXiv Detail & Related papers (2024-03-28T08:44:36Z)
DNA Family: Boosting Weight-Sharing NAS with Block-Wise Supervisions [121.05720140641189]
We develop a family of models with the distilling neural architecture (DNA) techniques. Our proposed DNA models can rate all architecture candidates, as opposed to previous works that can only access a sub- search space using algorithms. Our models achieve state-of-the-art top-1 accuracy of 78.9% and 83.6% on ImageNet for a mobile convolutional network and a small vision transformer, respectively.
arXiv Detail & Related papers (2024-03-02T22:16:47Z)
SiGeo: Sub-One-Shot NAS via Information Theory and Geometry of Loss Landscape [14.550053893504764]
We introduce a "sub-one-shot" paradigm that serves as a bridge between zero-shot and one-shot NAS. In sub-one-shot NAS, the supernet is trained using only a small subset of the training data, a phase we refer to as "warm-up" We present SiGeo, a proxy founded on a novel theoretical framework that connects the supernet warm-up with the efficacy of the proxy.
arXiv Detail & Related papers (2023-11-22T05:25:24Z)
Training-free Neural Architecture Search for RNNs and Transformers [0.0]
We develop a new training-free metric, named hidden covariance, that predicts the trained performance of an RNN architecture. We find that the current search space paradigm for transformer architectures is not optimized for training-free neural architecture search.
arXiv Detail & Related papers (2023-06-01T02:06:13Z)
A General-Purpose Transferable Predictor for Neural Architecture Search [22.883809911265445]
We propose a general-purpose neural predictor for Neural Architecture Search (NAS) that can transfer across search spaces. Experimental results on NAS-Bench-101, 201 and 301 demonstrate the efficacy of our scheme.
arXiv Detail & Related papers (2023-02-21T17:28:05Z)
ZiCo: Zero-shot NAS via Inverse Coefficient of Variation on Gradients [17.139381064317778]
We propose a new zero-shot proxy, ZiCo, that works consistently better than #Params. ZiCo-based NAS can find optimal architectures with 78.1%, 79.4%, and 80.4% test accuracy under inference budgets of 450M, 600M, and 1000M FLOPs, respectively.
arXiv Detail & Related papers (2023-01-26T18:38:56Z)
$\alpha$NAS: Neural Architecture Search using Property Guided Synthesis [1.2746672439030722]
We develop techniques that enable efficient neural architecture search (NAS) in a significantly larger design space. Our key insights are as follows: (1) the abstract search space is significantly smaller than the original search space, and (2) architectures with similar program properties also have similar performance. We implement our approach, $alpha$NAS, within an evolutionary framework, where the mutations are guided by the program properties.
arXiv Detail & Related papers (2022-05-08T21:48:03Z)
BaLeNAS: Differentiable Architecture Search via the Bayesian Learning Rule [95.56873042777316]
Differentiable Architecture Search (DARTS) has received massive attention in recent years, mainly because it significantly reduces the computational cost. This paper formulates the neural architecture search as a distribution learning problem through relaxing the architecture weights into Gaussian distributions. We demonstrate how the differentiable NAS benefits from Bayesian principles, enhancing exploration and improving stability.
arXiv Detail & Related papers (2021-11-25T18:13:42Z)
BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search [100.28980854978768]
We present Block-wisely Self-supervised Neural Architecture Search (BossNAS) We factorize the search space into blocks and utilize a novel self-supervised training scheme, named ensemble bootstrapping, to train each block separately. We also present HyTra search space, a fabric-like hybrid CNN-transformer search space with searchable down-sampling positions.
arXiv Detail & Related papers (2021-03-23T10:05:58Z)
Weak NAS Predictors Are All You Need [91.11570424233709]
Recent predictor-based NAS approaches attempt to solve the problem with two key steps: sampling some architecture-performance pairs and fitting a proxy accuracy predictor. We shift the paradigm from finding a complicated predictor that covers the whole architecture space to a set of weaker predictors that progressively move towards the high-performance sub-space. Our method costs fewer samples to find the top-performance architectures on NAS-Bench-101 and NAS-Bench-201, and it achieves the state-of-the-art ImageNet performance on the NASNet search space.
arXiv Detail & Related papers (2021-02-21T01:58:43Z)
Binarized Neural Architecture Search for Efficient Object Recognition [120.23378346337311]
Binarized neural architecture search (BNAS) produces extremely compressed models to reduce huge computational cost on embedded devices for edge computing. An accuracy of $96.53%$ vs. $97.22%$ is achieved on the CIFAR-10 dataset, but with a significantly compressed model, and a $40%$ faster search than the state-of-the-art PC-DARTS.
arXiv Detail & Related papers (2020-09-08T15:51:23Z)
Powering One-shot Topological NAS with Stabilized Share-parameter Proxy [65.09967910722932]
One-shot NAS method has attracted much interest from the research community due to its remarkable training efficiency and capacity to discover high performance models. In this work, we try to enhance the one-shot NAS by exploring high-performing network architectures in our large-scale Topology Augmented Search Space. The proposed method achieves state-of-the-art performance under Multiply-Adds (MAdds) constraint on ImageNet.
arXiv Detail & Related papers (2020-05-21T08:18:55Z)
DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution Pruning [135.27931587381596]
We propose an efficient and unified NAS framework termed DDPNAS via dynamic distribution pruning. In particular, we first sample architectures from a joint categorical distribution. Then the search space is dynamically pruned and its distribution is updated every few epochs. With the proposed efficient network generation method, we directly obtain the optimal neural architectures on given constraints.
arXiv Detail & Related papers (2019-05-28T06:35:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.