A Transformer-based Neural Architecture Search Method
- URL: http://arxiv.org/abs/2505.01314v1
- Date: Fri, 02 May 2025 14:40:16 GMT
- Title: A Transformer-based Neural Architecture Search Method
- Authors: Shang Wang, Huanrong Tang, Jianquan Ouyang,
- Abstract summary: We consider perplexity as an auxiliary evaluation metric for the algorithm in addition to BLEU scores.<n> Experimental results show that the neural network structures searched by the algorithm outperform all the baseline models.
- Score: 2.498836880652668
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a neural architecture search method based on Transformer architecture, searching cross multihead attention computation ways for different number of encoder and decoder combinations. In order to search for neural network structures with better translation results, we considered perplexity as an auxiliary evaluation metric for the algorithm in addition to BLEU scores and iteratively improved each individual neural network within the population by a multi-objective genetic algorithm. Experimental results show that the neural network structures searched by the algorithm outperform all the baseline models, and that the introduction of the auxiliary evaluation metric can find better models than considering only the BLEU score as an evaluation metric.
Related papers
- An automatic selection of optimal recurrent neural network architecture
for processes dynamics modelling purposes [0.0]
The research has included four original proposals of algorithms dedicated to neural network architecture search.
Algorithms have been based on well-known optimisation techniques such as evolutionary algorithms and gradient descent methods.
The research involved an extended validation study based on data generated from a mathematical model of the fast processes occurring in a pressurised water nuclear reactor.
arXiv Detail & Related papers (2023-09-25T11:06:35Z) - Set-based Neural Network Encoding Without Weight Tying [91.37161634310819]
We propose a neural network weight encoding method for network property prediction.<n>Our approach is capable of encoding neural networks in a model zoo of mixed architecture.<n>We introduce two new tasks for neural network property prediction: cross-dataset and cross-architecture.
arXiv Detail & Related papers (2023-05-26T04:34:28Z) - SA-CNN: Application to text categorization issues using simulated
annealing-based convolutional neural network optimization [0.0]
Convolutional neural networks (CNNs) are a representative class of deep learning algorithms.
We introduce SA-CNN neural networks for text classification tasks based on Text-CNN neural networks.
arXiv Detail & Related papers (2023-03-13T14:27:34Z) - A Recursively Recurrent Neural Network (R2N2) Architecture for Learning
Iterative Algorithms [64.3064050603721]
We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms.
We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
arXiv Detail & Related papers (2022-11-22T16:30:33Z) - Redefining Neural Architecture Search of Heterogeneous Multi-Network
Models by Characterizing Variation Operators and Model Components [71.03032589756434]
We investigate the effect of different variation operators in a complex domain, that of multi-network heterogeneous neural models.
We characterize both the variation operators, according to their effect on the complexity and performance of the model; and the models, relying on diverse metrics which estimate the quality of the different parts composing it.
arXiv Detail & Related papers (2021-06-16T17:12:26Z) - Learning Structures for Deep Neural Networks [99.8331363309895]
We propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience.
We show that sparse coding can effectively maximize the entropy of the output signals.
Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure.
arXiv Detail & Related papers (2021-05-27T12:27:24Z) - Genetic U-Net: Automatically Designed Deep Networks for Retinal Vessel
Segmentation Using a Genetic Algorithm [2.6629444004809826]
Genetic U-Net is proposed to generate a U-shaped convolutional neural network (CNN) that can achieve better retinal vessel segmentation but with fewer architecture-based parameters.
The experimental results show that the architecture obtained using the proposed method offered a superior performance with less than 1% of the number of the original U-Net parameters in particular.
arXiv Detail & Related papers (2020-10-29T13:31:36Z) - Equivalence in Deep Neural Networks via Conjugate Matrix Ensembles [0.0]
A numerical approach is developed for detecting the equivalence of deep learning architectures.
The empirical evidence supports the it phenomenon that difference between spectral densities of neural architectures and corresponding it conjugate circular ensemble are vanishing.
arXiv Detail & Related papers (2020-06-14T12:34:13Z) - DC-NAS: Divide-and-Conquer Neural Architecture Search [108.57785531758076]
We present a divide-and-conquer (DC) approach to effectively and efficiently search deep neural architectures.
We achieve a $75.1%$ top-1 accuracy on the ImageNet dataset, which is higher than that of state-of-the-art methods using the same search space.
arXiv Detail & Related papers (2020-05-29T09:02:16Z) - The efficiency of deep learning algorithms for detecting anatomical
reference points on radiological images of the head profile [55.41644538483948]
A U-Net neural network allows performing the detection of anatomical reference points more accurately than a fully convolutional neural network.
The results of the detection of anatomical reference points by the U-Net neural network are closer to the average results of the detection of reference points by a group of orthodontists.
arXiv Detail & Related papers (2020-05-25T13:51:03Z) - A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures.
A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z) - A Genetic Algorithm based Kernel-size Selection Approach for a
Multi-column Convolutional Neural Network [11.040847116812046]
We introduce a genetic algorithm-based technique to reduce the efforts of finding the optimal combination of a hyper-parameter ( Kernel size) of a convolutional neural network-based architecture.
The method is evaluated on three popular datasets of different handwritten Bangla characters and digits.
arXiv Detail & Related papers (2019-12-28T05:37:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.