Related papers: Dextr: Zero-Shot Neural Architecture Search with Singular Value Decomposition and Extrinsic Curvature

Dextr: Zero-Shot Neural Architecture Search with Singular Value Decomposition and Extrinsic Curvature

URL: http://arxiv.org/abs/2508.12977v1
Date: Mon, 18 Aug 2025 14:52:14 GMT
Title: Dextr: Zero-Shot Neural Architecture Search with Singular Value Decomposition and Extrinsic Curvature
Authors: Rohan Asthana, Joschua Conrad, Maurits Ortmanns, Vasileios Belagiannis,
Abstract summary: We propose a zero-cost proxy that omits the requirement of labelled data for its computation.<n>Our approach enables accurate prediction of network performance on test data using only a single label-free data sample.
Score: 8.219278958506592
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Zero-shot Neural Architecture Search (NAS) typically optimises the architecture search process by exploiting the network or gradient properties at initialisation through zero-cost proxies. The existing proxies often rely on labelled data, which is usually unavailable in real-world settings. Furthermore, the majority of the current methods focus either on optimising the convergence and generalisation attributes or solely on the expressivity of the network architectures. To address both limitations, we first demonstrate how channel collinearity affects the convergence and generalisation properties of a neural network. Then, by incorporating the convergence, generalisation and expressivity in one approach, we propose a zero-cost proxy that omits the requirement of labelled data for its computation. In particular, we leverage the Singular Value Decomposition (SVD) of the neural network layer features and the extrinsic curvature of the network output to design our proxy. %As a result, the proposed proxy is formulated as the simplified harmonic mean of the logarithms of two key components: the sum of the inverse of the feature condition number and the extrinsic curvature of the network output. Our approach enables accurate prediction of network performance on test data using only a single label-free data sample. Our extensive evaluation includes a total of six experiments, including the Convolutional Neural Network (CNN) search space, i.e. DARTS and the Transformer search space, i.e. AutoFormer. The proposed proxy demonstrates a superior performance on multiple correlation benchmarks, including NAS-Bench-101, NAS-Bench-201, and TransNAS-Bench-101-micro; as well as on the NAS task within the DARTS and the AutoFormer search space, all while being notably efficient. The code is available at https://github.com/rohanasthana/Dextr.

Related papers

ONNX-Net: Towards Universal Representations and Instant Performance Prediction for Neural Architectures [60.14199724905456]
ONNX-Bench is a benchmark consisting of a collection of neural networks in a unified format based on ONNX files.<n> ONNX-Net represents any neural architecture using natural language descriptions acting as an input to a performance predictor.<n>Experiments show strong zero-shot performance across disparate search spaces using only a small amount of pretraining samples.
arXiv Detail & Related papers (2025-10-06T15:43:36Z)
ZeroLM: Data-Free Transformer Architecture Search for Language Models [54.83882149157548]
Current automated proxy discovery approaches suffer from extended search times, susceptibility to data overfitting, and structural complexity.<n>This paper introduces a novel zero-cost proxy methodology that quantifies model capacity through efficient weight statistics.<n>Our evaluation demonstrates the superiority of this approach, achieving a Spearman's rho of 0.76 and Kendall's tau of 0.53 on the FlexiBERT benchmark.
arXiv Detail & Related papers (2025-03-24T13:11:22Z)
Evaluating a Novel Neuroevolution and Neural Architecture Search System [0.0]
We show the effectiveness of Neuvo NAS+ a novel Python implementation of an extended Neural Architecture Search (NAS+)<n>We describe the design of the Neuvo NAS+ system that selects network features on a task-specific basis.<n>Results show that the Neuvo NAS+ approach significantly outperforms several machine learning approaches.
arXiv Detail & Related papers (2025-03-13T20:35:34Z)
TG-NAS: Generalizable Zero-Cost Proxies with Operator Description Embedding and Graph Learning for Efficient Neural Architecture Search [1.7356500114422735]
TG-NAS is a universal, model-based zero-cost proxy that combines a Transformer-based operator embedding generator with a Graph Convolutional Network (GCN) to predict architecture performance.<n>It improves search efficiency by up to 300x and discovers architectures achieving 93.75% CIFAR-10 accuracy on NAS-Bench-201 and 74.9% ImageNet top-1 accuracy on the DARTS space.
arXiv Detail & Related papers (2024-03-30T07:25:30Z)
Efficacy of Neural Prediction-Based Zero-Shot NAS [0.04096453902709291]
We propose a novel approach for zero-shot Neural Architecture Search (NAS) using deep learning. Our method employs Fourier sum of sines encoding for convolutional kernels, enabling the construction of a computational feed-forward graph with a structure similar to the architecture under evaluation. Experimental results show that our approach surpasses previous methods using graph convolutional networks in terms of correlation on the NAS-Bench-201 dataset and exhibits a higher convergence rate.
arXiv Detail & Related papers (2023-08-31T14:54:06Z)
Contextualizing MLP-Mixers Spatiotemporally for Urban Data Forecast at Scale [54.15522908057831]
We propose an adapted version of the computationally-Mixer for STTD forecast at scale. Our results surprisingly show that this simple-yeteffective solution can rival SOTA baselines when tested on several traffic benchmarks. Our findings contribute to the exploration of simple-yet-effective models for real-world STTD forecasting.
arXiv Detail & Related papers (2023-07-04T05:19:19Z)
Set-based Neural Network Encoding Without Weight Tying [91.37161634310819]
We propose a neural network weight encoding method for network property prediction.<n>Our approach is capable of encoding neural networks in a model zoo of mixed architecture.<n>We introduce two new tasks for neural network property prediction: cross-dataset and cross-architecture.
arXiv Detail & Related papers (2023-05-26T04:34:28Z)
A General-Purpose Transferable Predictor for Neural Architecture Search [22.883809911265445]
We propose a general-purpose neural predictor for Neural Architecture Search (NAS) that can transfer across search spaces. Experimental results on NAS-Bench-101, 201 and 301 demonstrate the efficacy of our scheme.
arXiv Detail & Related papers (2023-02-21T17:28:05Z)
NAR-Former: Neural Architecture Representation Learning towards Holistic Attributes Prediction [37.357949900603295]
We propose a neural architecture representation model that can be used to estimate attributes holistically. Experiment results show that our proposed framework can be used to predict the latency and accuracy attributes of both cell architectures and whole deep neural networks.
arXiv Detail & Related papers (2022-11-15T10:15:21Z)
Self-Challenging Improves Cross-Domain Generalization [81.99554996975372]
Convolutional Neural Networks (CNN) conduct image classification by activating dominant features that correlated with labels. We introduce a simple training, Self-Challenging Representation (RSC), that significantly improves the generalization of CNN to the out-of-domain data. RSC iteratively challenges the dominant features activated on the training data, and forces the network to activate remaining features that correlates with labels.
arXiv Detail & Related papers (2020-07-05T21:42:26Z)
DC-NAS: Divide-and-Conquer Neural Architecture Search [108.57785531758076]
We present a divide-and-conquer (DC) approach to effectively and efficiently search deep neural architectures. We achieve a $75.1%$ top-1 accuracy on the ImageNet dataset, which is higher than that of state-of-the-art methods using the same search space.
arXiv Detail & Related papers (2020-05-29T09:02:16Z)
Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters. Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques. We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.