Related papers: TART: Token-based Architecture Transformer for Neural Network Performance Prediction

TART: Token-based Architecture Transformer for Neural Network Performance Prediction

URL: http://arxiv.org/abs/2501.02007v1
Date: Thu, 02 Jan 2025 05:22:17 GMT
Title: TART: Token-based Architecture Transformer for Neural Network Performance Prediction
Authors: Yannis Y. He,
Abstract summary: Token-based Architecture Transformer (TART) predicts neural network performance without the need to train candidate networks.<n>TART attains state-of-the-art performance on the DeepNets-1M dataset for performance prediction tasks without edge information.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the realm of neural architecture design, achieving high performance is largely reliant on the manual expertise of researchers. Despite the emergence of Neural Architecture Search (NAS) as a promising technique for automating this process, current NAS methods still require human input to expand the search space and cannot generate new architectures. This paper explores the potential of Transformers in comprehending neural architectures and their performance, with the objective of establishing the foundation for utilizing Transformers to generate novel networks. We propose the Token-based Architecture Transformer (TART), which predicts neural network performance without the need to train candidate networks. TART attains state-of-the-art performance on the DeepNets-1M dataset for performance prediction tasks without edge information, indicating the potential of Transformers to aid in discovering novel and high-performing neural architectures.

Related papers

NN-Former: Rethinking Graph Structure in Neural Architecture Representation [67.3378579108611]
Graph Neural Networks (GNNs) and transformers have shown promising performance in representing neural architectures.<n>We show that sibling nodes are pivotal while overlooked in previous research.<n>Our approach consistently achieves promising performance in both accuracy and latency prediction.
arXiv Detail & Related papers (2025-07-01T15:46:18Z)
Rodrigues Network for Learning Robot Actions [76.69283501115855]
We propose the Neural Rodrigues Operator to inject kinematics-aware inductive bias into neural computation.<n>We design the Rodrigues Network (RodriNet), a novel neural architecture specialized for processing actions.<n>Our results suggest that integrating structured kinematic priors into the network architecture improves action learning in various domains.
arXiv Detail & Related papers (2025-06-03T08:34:06Z)
Contextualizing MLP-Mixers Spatiotemporally for Urban Data Forecast at Scale [54.15522908057831]
We propose an adapted version of the computationally-Mixer for STTD forecast at scale. Our results surprisingly show that this simple-yeteffective solution can rival SOTA baselines when tested on several traffic benchmarks. Our findings contribute to the exploration of simple-yet-effective models for real-world STTD forecasting.
arXiv Detail & Related papers (2023-07-04T05:19:19Z)
AutoST: Training-free Neural Architecture Search for Spiking Transformers [14.791412391584064]
Spiking Transformers achieve both the energy efficiency of Spiking Neural Networks (SNNs) and the high capacity of Transformers. Existing Spiking Transformer architectures exhibit a notable architectural gap, resulting in suboptimal performance. We introduce AutoST, a training-free NAS method for Spiking Transformers, to rapidly identify high-performance Spiking Transformer architectures.
arXiv Detail & Related papers (2023-07-01T10:19:52Z)
Training-free Neural Architecture Search for RNNs and Transformers [0.0]
We develop a new training-free metric, named hidden covariance, that predicts the trained performance of an RNN architecture. We find that the current search space paradigm for transformer architectures is not optimized for training-free neural architecture search.
arXiv Detail & Related papers (2023-06-01T02:06:13Z)
DiffusionNAG: Predictor-guided Neural Architecture Generation with Diffusion Models [56.584561770857306]
We propose a novel conditional Neural Architecture Generation (NAG) framework based on diffusion models, dubbed DiffusionNAG. Specifically, we consider the neural architectures as directed graphs and propose a graph diffusion model for generating them. We validate the effectiveness of DiffusionNAG through extensive experiments in two predictor-based NAS scenarios: Transferable NAS and Bayesian Optimization (BO)-based NAS. When integrated into a BO-based algorithm, DiffusionNAG outperforms existing BO-based NAS approaches, particularly in the large MobileNetV3 search space on the ImageNet 1K dataset.
arXiv Detail & Related papers (2023-05-26T13:58:18Z)
Neural Architecture Search for Dense Prediction Tasks in Computer Vision [74.9839082859151]
Deep learning has led to a rising demand for neural network architecture engineering. neural architecture search (NAS) aims at automatically designing neural network architectures in a data-driven manner rather than manually. NAS has become applicable to a much wider range of problems in computer vision.
arXiv Detail & Related papers (2022-02-15T08:06:50Z)
Efficient Neural Architecture Search with Performance Prediction [0.0]
We use a neural architecture search to find the best network architecture for the task at hand. Existing NAS algorithms generally evaluate the fitness of a new architecture by fully training from scratch. An end-to-end offline performance predictor is proposed to accelerate the evaluation of sampled architectures.
arXiv Detail & Related papers (2021-08-04T05:44:16Z)
Neural Architecture Performance Prediction Using Graph Neural Networks [17.224223176258334]
We propose a surrogate model for neural architecture performance prediction built upon Graph Neural Networks (GNN) We demonstrate the effectiveness of this surrogate model on neural architecture performance prediction for structurally unknown architectures.
arXiv Detail & Related papers (2020-10-19T09:33:57Z)
MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS) We employ a one-shot architecture search approach in order to obtain a reduced search cost. We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z)
NAS-Navigator: Visual Steering for Explainable One-Shot Deep Neural Network Synthesis [53.106414896248246]
We present a framework that allows analysts to effectively build the solution sub-graph space and guide the network search by injecting their domain knowledge. Applying this technique in an iterative manner allows analysts to converge to the best performing neural network architecture for a given application.
arXiv Detail & Related papers (2020-09-28T01:48:45Z)
Towards Automated Neural Interaction Discovery for Click-Through Rate Prediction [64.03526633651218]
Click-Through Rate (CTR) prediction is one of the most important machine learning tasks in recommender systems. We propose an automated interaction architecture discovering framework for CTR prediction named AutoCTR.
arXiv Detail & Related papers (2020-06-29T04:33:01Z)
A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures. A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z)
Stage-Wise Neural Architecture Search [65.03109178056937]
Modern convolutional networks such as ResNet and NASNet have achieved state-of-the-art results in many computer vision applications. These networks consist of stages, which are sets of layers that operate on representations in the same resolution. It has been demonstrated that increasing the number of layers in each stage improves the prediction ability of the network. However, the resulting architecture becomes computationally expensive in terms of floating point operations, memory requirements and inference time.
arXiv Detail & Related papers (2020-04-23T14:16:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.