Multi-fidelity Neural Architecture Search with Knowledge Distillation
- URL: http://arxiv.org/abs/2006.08341v2
- Date: Wed, 19 May 2021 09:17:16 GMT
- Title: Multi-fidelity Neural Architecture Search with Knowledge Distillation
- Authors: Ilya Trofimov, Nikita Klyuchnikov, Mikhail Salnikov, Alexander
Filippov, Evgeny Burnaev
- Abstract summary: We propose a bayesian multi-fidelity method for neural architecture search: MF-KD.
Knowledge distillation adds to a loss function a term forcing a network to mimic some teacher network.
We show that training for a few epochs with such a modified loss function leads to a better selection of neural architectures than training for a few epochs with a logistic loss.
- Score: 69.09782590880367
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural architecture search (NAS) targets at finding the optimal architecture
of a neural network for a problem or a family of problems. Evaluations of
neural architectures are very time-consuming. One of the possible ways to
mitigate this issue is to use low-fidelity evaluations, namely training on a
part of a dataset, fewer epochs, with fewer channels, etc. In this paper, we
propose a bayesian multi-fidelity method for neural architecture search: MF-KD.
The method relies on a new approach to low-fidelity evaluations of neural
architectures by training for a few epochs using a knowledge distillation.
Knowledge distillation adds to a loss function a term forcing a network to
mimic some teacher network. We carry out experiments on CIFAR-10, CIFAR-100,
and ImageNet-16-120. We show that training for a few epochs with such a
modified loss function leads to a better selection of neural architectures than
training for a few epochs with a logistic loss. The proposed method outperforms
several state-of-the-art baselines.
Related papers
- Towards Theoretically Inspired Neural Initialization Optimization [66.04735385415427]
We propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network.
We show that both the training and test performance of a network can be improved by maximizing GradCosine under norm constraint.
Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost.
arXiv Detail & Related papers (2022-10-12T06:49:16Z) - BayesFT: Bayesian Optimization for Fault Tolerant Neural Network
Architecture [8.005491953251541]
We propose a novel Bayesian optimization method for fault tolerant neural network architecture (BayesFT)
Our framework has outperformed the state-of-the-art methods by up to 10 times on various tasks, such as image classification and object detection.
arXiv Detail & Related papers (2022-09-30T20:13:05Z) - Demystifying the Neural Tangent Kernel from a Practical Perspective: Can
it be trusted for Neural Architecture Search without training? [37.29036906991086]
In this work, we revisit several at-initialization metrics that can be derived from the Neural Tangent Kernel (NTK)
We deduce that modern neural architectures exhibit highly non-linear characteristics, making the NTK-based metrics incapable of reliably estimating the performance of an architecture without some amount of training.
We introduce Label-Gradient Alignment (LGA), a novel NTK-based metric whose inherent formulation allows it to capture the large amount of non-linear advantage present in modern neural architectures.
arXiv Detail & Related papers (2022-03-28T08:43:04Z) - Self-Denoising Neural Networks for Few Shot Learning [66.38505903102373]
We present a new training scheme that adds noise at multiple stages of an existing neural architecture while simultaneously learning to be robust to this added noise.
This architecture, which we call a Self-Denoising Neural Network (SDNN), can be applied easily to most modern convolutional neural architectures.
arXiv Detail & Related papers (2021-10-26T03:28:36Z) - D-DARTS: Distributed Differentiable Architecture Search [75.12821786565318]
Differentiable ARchiTecture Search (DARTS) is one of the most trending Neural Architecture Search (NAS) methods.
We propose D-DARTS, a novel solution that addresses this problem by nesting several neural networks at cell-level.
arXiv Detail & Related papers (2021-08-20T09:07:01Z) - A Novel Framework for Neural Architecture Search in the Hill Climbing
Domain [2.729898906885749]
We propose a new framework for neural architecture search based on a hill-climbing procedure.
We achieve a 4.96% error rate on the CIFAR-10 dataset in 19.4 hours of a single GPU training.
arXiv Detail & Related papers (2021-02-22T04:34:29Z) - Local Critic Training for Model-Parallel Learning of Deep Neural
Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training.
We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z) - VINNAS: Variational Inference-based Neural Network Architecture Search [2.685668802278155]
We present a differentiable variational inference-based NAS method for searching sparse convolutional neural networks.
Our method finds diverse network cells, while showing state-of-the-art accuracy with up to almost 2 times fewer non-zero parameters.
arXiv Detail & Related papers (2020-07-12T21:47:35Z) - A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures.
A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.