Differentiable Neural Architecture Learning for Efficient Neural Network
Design
- URL: http://arxiv.org/abs/2103.02126v1
- Date: Wed, 3 Mar 2021 02:03:08 GMT
- Title: Differentiable Neural Architecture Learning for Efficient Neural Network
Design
- Authors: Qingbei Guo and Xiao-Jun Wu and Josef Kittler and Zhiquan Feng
- Abstract summary: We introduce a novel emph architecture parameterisation based on scaled sigmoid function.
We then propose a general emphiable Neural Architecture Learning (DNAL) method to optimize the neural architecture without the need to evaluate candidate neural networks.
- Score: 31.23038136038325
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Automated neural network design has received ever-increasing attention with
the evolution of deep convolutional neural networks (CNNs), especially
involving their deployment on embedded and mobile platforms. One of the biggest
problems that neural architecture search (NAS) confronts is that a large number
of candidate neural architectures are required to train, using, for instance,
reinforcement learning and evolutionary optimisation algorithms, at a vast
computation cost. Even recent differentiable neural architecture search (DNAS)
samples a small number of candidate neural architectures based on the
probability distribution of learned architecture parameters to select the final
neural architecture. To address this computational complexity issue, we
introduce a novel \emph{architecture parameterisation} based on scaled sigmoid
function, and propose a general \emph{Differentiable Neural Architecture
Learning} (DNAL) method to optimize the neural architecture without the need to
evaluate candidate neural networks. Specifically, for stochastic supernets as
well as conventional CNNs, we build a new channel-wise module layer with the
architecture components controlled by a scaled sigmoid function. We train these
neural network models from scratch. The network optimization is decoupled into
the weight optimization and the architecture optimization. We address the
non-convex optimization problem of neural architecture by the continuous scaled
sigmoid method with convergence guarantees. Extensive experiments demonstrate
our DNAL method delivers superior performance in terms of neural architecture
search cost. The optimal networks learned by DNAL surpass those produced by the
state-of-the-art methods on the benchmark CIFAR-10 and ImageNet-1K dataset in
accuracy, model size and computational complexity.
Related papers
- Simultaneous Weight and Architecture Optimization for Neural Networks [6.2241272327831485]
We introduce a novel neural network training framework that transforms the process by learning architecture and parameters simultaneously with gradient descent.
Central to our approach is a multi-scale encoder-decoder, in which the encoder embeds pairs of neural networks with similar functionalities close to each other.
Experiments demonstrate that our framework can discover sparse and compact neural networks maintaining a high performance.
arXiv Detail & Related papers (2024-10-10T19:57:36Z) - Growing Tiny Networks: Spotting Expressivity Bottlenecks and Fixing Them Optimally [2.645067871482715]
In machine learning tasks, one searches for an optimal function within a certain functional space.
This way forces the evolution of the function during training to lie within the realm of what is expressible with the chosen architecture.
We show that the information about desirable architectural changes, due to expressivity bottlenecks can be extracted from %the backpropagation.
arXiv Detail & Related papers (2024-05-30T08:23:56Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Neural Architecture Search using Particle Swarm and Ant Colony
Optimization [0.0]
This paper focuses on training and optimizing CNNs using the Swarm Intelligence (SI) components of OpenNAS.
A system integrating open source tools for Neural Architecture Search (OpenNAS), in the classification of images, has been developed.
arXiv Detail & Related papers (2024-03-06T15:23:26Z) - Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process.
In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture.
We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z) - Set-based Neural Network Encoding Without Weight Tying [91.37161634310819]
We propose a neural network weight encoding method for network property prediction.
Our approach is capable of encoding neural networks in a model zoo of mixed architecture.
We introduce two new tasks for neural network property prediction: cross-dataset and cross-architecture.
arXiv Detail & Related papers (2023-05-26T04:34:28Z) - NAR-Former: Neural Architecture Representation Learning towards Holistic
Attributes Prediction [37.357949900603295]
We propose a neural architecture representation model that can be used to estimate attributes holistically.
Experiment results show that our proposed framework can be used to predict the latency and accuracy attributes of both cell architectures and whole deep neural networks.
arXiv Detail & Related papers (2022-11-15T10:15:21Z) - Towards Theoretically Inspired Neural Initialization Optimization [66.04735385415427]
We propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network.
We show that both the training and test performance of a network can be improved by maximizing GradCosine under norm constraint.
Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost.
arXiv Detail & Related papers (2022-10-12T06:49:16Z) - A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures.
A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z) - Binarizing MobileNet via Evolution-based Searching [66.94247681870125]
We propose a use of evolutionary search to facilitate the construction and training scheme when binarizing MobileNet.
Inspired by one-shot architecture search frameworks, we manipulate the idea of group convolution to design efficient 1-Bit Convolutional Neural Networks (CNNs)
Our objective is to come up with a tiny yet efficient binary neural architecture by exploring the best candidates of the group convolution.
arXiv Detail & Related papers (2020-05-13T13:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.