Training-free Neural Architecture Search for RNNs and Transformers
- URL: http://arxiv.org/abs/2306.00288v1
- Date: Thu, 1 Jun 2023 02:06:13 GMT
- Title: Training-free Neural Architecture Search for RNNs and Transformers
- Authors: Aaron Serianni (Princeton University), Jugal Kalita (University of
Colorado at Colorado Springs)
- Abstract summary: We develop a new training-free metric, named hidden covariance, that predicts the trained performance of an RNN architecture.
We find that the current search space paradigm for transformer architectures is not optimized for training-free neural architecture search.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural architecture search (NAS) has allowed for the automatic creation of
new and effective neural network architectures, offering an alternative to the
laborious process of manually designing complex architectures. However,
traditional NAS algorithms are slow and require immense amounts of computing
power. Recent research has investigated training-free NAS metrics for image
classification architectures, drastically speeding up search algorithms. In
this paper, we investigate training-free NAS metrics for recurrent neural
network (RNN) and BERT-based transformer architectures, targeted towards
language modeling tasks. First, we develop a new training-free metric, named
hidden covariance, that predicts the trained performance of an RNN architecture
and significantly outperforms existing training-free metrics. We experimentally
evaluate the effectiveness of the hidden covariance metric on the NAS-Bench-NLP
benchmark. Second, we find that the current search space paradigm for
transformer architectures is not optimized for training-free neural
architecture search. Instead, a simple qualitative analysis can effectively
shrink the search space to the best performing architectures. This conclusion
is based on our investigation of existing training-free metrics and new metrics
developed from recent transformer pruning literature, evaluated on our own
benchmark of trained BERT architectures. Ultimately, our analysis shows that
the architecture search space and the training-free metric must be developed
together in order to achieve effective results.
Related papers
- Efficient Multi-Objective Neural Architecture Search via Pareto Dominance-based Novelty Search [0.0]
Neural Architecture Search (NAS) aims to automate the discovery of high-performing deep neural network architectures.
Traditional NAS approaches typically optimize a certain performance metric (e.g., prediction accuracy) overlooking large parts of the architecture search space that potentially contain interesting network configurations.
This paper presents a novelty search for multi-objective NAS with Multiple Training-Free metrics (MTF-PDNS)
arXiv Detail & Related papers (2024-07-30T08:52:10Z) - A Pairwise Comparison Relation-assisted Multi-objective Evolutionary Neural Architecture Search Method with Multi-population Mechanism [58.855741970337675]
Neural architecture search (NAS) enables re-searchers to automatically explore vast search spaces and find efficient neural networks.
NAS suffers from a key bottleneck, i.e., numerous architectures need to be evaluated during the search process.
We propose the SMEM-NAS, a pairwise com-parison relation-assisted multi-objective evolutionary algorithm based on a multi-population mechanism.
arXiv Detail & Related papers (2024-07-22T12:46:22Z) - TG-NAS: Leveraging Zero-Cost Proxies with Transformer and Graph Convolution Networks for Efficient Neural Architecture Search [1.30891455653235]
TG-NAS aims to create training-free proxies for architecture performance prediction.
We introduce TG-NAS, a novel model-based universal proxy that leverages a transformer-based operator embedding generator and a graph convolution network (GCN) to predict architecture performance.
TG-NAS achieves up to 300X improvements in search efficiency compared to previous SOTA ZC proxy methods.
arXiv Detail & Related papers (2024-03-30T07:25:30Z) - Robustifying and Boosting Training-Free Neural Architecture Search [49.828875134088904]
We propose a robustifying and boosting training-free NAS (RoBoT) algorithm to develop a robust and consistently better-performing metric on diverse tasks.
Remarkably, the expected performance of our RoBoT can be theoretically guaranteed, which improves over the existing training-free NAS.
arXiv Detail & Related papers (2024-03-12T12:24:11Z) - DCP-NAS: Discrepant Child-Parent Neural Architecture Search for 1-bit
CNNs [53.82853297675979]
1-bit convolutional neural networks (CNNs) with binary weights and activations show their potential for resource-limited embedded devices.
One natural approach is to use 1-bit CNNs to reduce the computation and memory cost of NAS.
We introduce Discrepant Child-Parent Neural Architecture Search (DCP-NAS) to efficiently search 1-bit CNNs.
arXiv Detail & Related papers (2023-06-27T11:28:29Z) - Neural Architecture Search for Speech Emotion Recognition [72.1966266171951]
We propose to apply neural architecture search (NAS) techniques to automatically configure the SER models.
We show that NAS can improve SER performance (54.89% to 56.28%) while maintaining model parameter sizes.
arXiv Detail & Related papers (2022-03-31T10:16:10Z) - Demystifying the Neural Tangent Kernel from a Practical Perspective: Can
it be trusted for Neural Architecture Search without training? [37.29036906991086]
In this work, we revisit several at-initialization metrics that can be derived from the Neural Tangent Kernel (NTK)
We deduce that modern neural architectures exhibit highly non-linear characteristics, making the NTK-based metrics incapable of reliably estimating the performance of an architecture without some amount of training.
We introduce Label-Gradient Alignment (LGA), a novel NTK-based metric whose inherent formulation allows it to capture the large amount of non-linear advantage present in modern neural architectures.
arXiv Detail & Related papers (2022-03-28T08:43:04Z) - Learning Interpretable Models Through Multi-Objective Neural
Architecture Search [0.9990687944474739]
We propose a framework to optimize for both task performance and "introspectability," a surrogate metric for aspects of interpretability.
We demonstrate that jointly optimizing for task error and introspectability leads to more disentangled and debuggable architectures that perform within error.
arXiv Detail & Related papers (2021-12-16T05:50:55Z) - Conceptual Expansion Neural Architecture Search (CENAS) [1.3464152928754485]
We present an approach called Conceptual Expansion Neural Architecture Search (CENAS)
It combines a sample-efficient, computational creativity-inspired transfer learning approach with neural architecture search.
It finds models faster than naive architecture search via transferring existing weights to approximate the parameters of the new model.
arXiv Detail & Related papers (2021-10-07T02:29:26Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z) - A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures.
A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.