Related papers: Training-free Neural Architecture Search for RNNs and Transformers

Training-free Neural Architecture Search for RNNs and Transformers

URL: http://arxiv.org/abs/2306.00288v1
Date: Thu, 1 Jun 2023 02:06:13 GMT
Title: Training-free Neural Architecture Search for RNNs and Transformers
Authors: Aaron Serianni (Princeton University), Jugal Kalita (University of Colorado at Colorado Springs)
Abstract summary: We develop a new training-free metric, named hidden covariance, that predicts the trained performance of an RNN architecture. We find that the current search space paradigm for transformer architectures is not optimized for training-free neural architecture search.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neural architecture search (NAS) has allowed for the automatic creation of new and effective neural network architectures, offering an alternative to the laborious process of manually designing complex architectures. However, traditional NAS algorithms are slow and require immense amounts of computing power. Recent research has investigated training-free NAS metrics for image classification architectures, drastically speeding up search algorithms. In this paper, we investigate training-free NAS metrics for recurrent neural network (RNN) and BERT-based transformer architectures, targeted towards language modeling tasks. First, we develop a new training-free metric, named hidden covariance, that predicts the trained performance of an RNN architecture and significantly outperforms existing training-free metrics. We experimentally evaluate the effectiveness of the hidden covariance metric on the NAS-Bench-NLP benchmark. Second, we find that the current search space paradigm for transformer architectures is not optimized for training-free neural architecture search. Instead, a simple qualitative analysis can effectively shrink the search space to the best performing architectures. This conclusion is based on our investigation of existing training-free metrics and new metrics developed from recent transformer pruning literature, evaluated on our own benchmark of trained BERT architectures. Ultimately, our analysis shows that the architecture search space and the training-free metric must be developed together in order to achieve effective results.

Related papers

DANCE: Resource-Efficient Neural Architecture Search with Data-Aware and Continuous Adaptation [33.08911251924756]
We propose DANCE (Dynamic Architectures with Neural Continuous Evolution), which reformulates architecture search as a continuous evolution problem.<n>DANCE introduces three key innovations: a continuous architecture distribution enabling smooth adaptation, a unified architecture space with learned selection gates for efficient sampling, and a multi-stage training strategy for effective deployment optimization.<n>Our method consistently outperforms state-of-the-art NAS approaches in terms of accuracy while significantly reducing search costs.
arXiv Detail & Related papers (2025-07-07T05:22:55Z)
Federated Neural Architecture Search with Model-Agnostic Meta Learning [7.542593703407386]
Federated Neural Architecture Search (NAS) enables collaborative search for optimal model architectures tailored to heterogeneous data to achieve higher accuracy. We introduce FedMetaNAS, a framework that integrates meta-learning with NAS within the Federated Learning context. We show that FedMetaNAS significantly accelerates the search process by more than 50% with higher accuracy compared to FedNAS.
arXiv Detail & Related papers (2025-04-08T21:57:40Z)
ZeroLM: Data-Free Transformer Architecture Search for Language Models [54.83882149157548]
Current automated proxy discovery approaches suffer from extended search times, susceptibility to data overfitting, and structural complexity. This paper introduces a novel zero-cost proxy methodology that quantifies model capacity through efficient weight statistics. Our evaluation demonstrates the superiority of this approach, achieving a Spearman's rho of 0.76 and Kendall's tau of 0.53 on the FlexiBERT benchmark.
arXiv Detail & Related papers (2025-03-24T13:11:22Z)
Efficient Global Neural Architecture Search [2.0973843981871574]
We propose an architecture-aware approximation with variable training schemes for different networks. Our proposed framework achieves a new state-of-the-art on EMNIST and KMNIST, while being highly competitive on the CIFAR-10, CIFAR-100, and FashionMNIST datasets.
arXiv Detail & Related papers (2025-02-05T19:10:17Z)
TART: Token-based Architecture Transformer for Neural Network Performance Prediction [0.0]
Token-based Architecture Transformer (TART) predicts neural network performance without the need to train candidate networks. TART attains state-of-the-art performance on the DeepNets-1M dataset for performance prediction tasks without edge information.
arXiv Detail & Related papers (2025-01-02T05:22:17Z)
Efficient Multi-Objective Neural Architecture Search via Pareto Dominance-based Novelty Search [0.0]
Neural Architecture Search (NAS) aims to automate the discovery of high-performing deep neural network architectures. Traditional NAS approaches typically optimize a certain performance metric (e.g., prediction accuracy) overlooking large parts of the architecture search space that potentially contain interesting network configurations. This paper presents a novelty search for multi-objective NAS with Multiple Training-Free metrics (MTF-PDNS)
arXiv Detail & Related papers (2024-07-30T08:52:10Z)
A Pairwise Comparison Relation-assisted Multi-objective Evolutionary Neural Architecture Search Method with Multi-population Mechanism [58.855741970337675]
Neural architecture search (NAS) enables re-searchers to automatically explore vast search spaces and find efficient neural networks. NAS suffers from a key bottleneck, i.e., numerous architectures need to be evaluated during the search process. We propose the SMEM-NAS, a pairwise com-parison relation-assisted multi-objective evolutionary algorithm based on a multi-population mechanism.
arXiv Detail & Related papers (2024-07-22T12:46:22Z)
TG-NAS: Leveraging Zero-Cost Proxies with Transformer and Graph Convolution Networks for Efficient Neural Architecture Search [1.30891455653235]
TG-NAS aims to create training-free proxies for architecture performance prediction. We introduce TG-NAS, a novel model-based universal proxy that leverages a transformer-based operator embedding generator and a graph convolution network (GCN) to predict architecture performance. TG-NAS achieves up to 300X improvements in search efficiency compared to previous SOTA ZC proxy methods.
arXiv Detail & Related papers (2024-03-30T07:25:30Z)
Robustifying and Boosting Training-Free Neural Architecture Search [49.828875134088904]
We propose a robustifying and boosting training-free NAS (RoBoT) algorithm to develop a robust and consistently better-performing metric on diverse tasks. Remarkably, the expected performance of our RoBoT can be theoretically guaranteed, which improves over the existing training-free NAS.
arXiv Detail & Related papers (2024-03-12T12:24:11Z)
DCP-NAS: Discrepant Child-Parent Neural Architecture Search for 1-bit CNNs [53.82853297675979]
1-bit convolutional neural networks (CNNs) with binary weights and activations show their potential for resource-limited embedded devices. One natural approach is to use 1-bit CNNs to reduce the computation and memory cost of NAS. We introduce Discrepant Child-Parent Neural Architecture Search (DCP-NAS) to efficiently search 1-bit CNNs.
arXiv Detail & Related papers (2023-06-27T11:28:29Z)
Neural Architecture Search for Speech Emotion Recognition [72.1966266171951]
We propose to apply neural architecture search (NAS) techniques to automatically configure the SER models. We show that NAS can improve SER performance (54.89% to 56.28%) while maintaining model parameter sizes.
arXiv Detail & Related papers (2022-03-31T10:16:10Z)
Demystifying the Neural Tangent Kernel from a Practical Perspective: Can it be trusted for Neural Architecture Search without training? [37.29036906991086]
In this work, we revisit several at-initialization metrics that can be derived from the Neural Tangent Kernel (NTK) We deduce that modern neural architectures exhibit highly non-linear characteristics, making the NTK-based metrics incapable of reliably estimating the performance of an architecture without some amount of training. We introduce Label-Gradient Alignment (LGA), a novel NTK-based metric whose inherent formulation allows it to capture the large amount of non-linear advantage present in modern neural architectures.
arXiv Detail & Related papers (2022-03-28T08:43:04Z)
Learning Interpretable Models Through Multi-Objective Neural Architecture Search [0.9990687944474739]
We propose a framework to optimize for both task performance and "introspectability," a surrogate metric for aspects of interpretability. We demonstrate that jointly optimizing for task error and introspectability leads to more disentangled and debuggable architectures that perform within error.
arXiv Detail & Related papers (2021-12-16T05:50:55Z)
Conceptual Expansion Neural Architecture Search (CENAS) [1.3464152928754485]
We present an approach called Conceptual Expansion Neural Architecture Search (CENAS) It combines a sample-efficient, computational creativity-inspired transfer learning approach with neural architecture search. It finds models faster than naive architecture search via transferring existing weights to approximate the parameters of the new model.
arXiv Detail & Related papers (2021-10-07T02:29:26Z)
MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS) We employ a one-shot architecture search approach in order to obtain a reduced search cost. We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z)
A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures. A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.