Related papers: Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search

Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search

URL: http://arxiv.org/abs/2010.15821v3
Date: Mon, 12 Apr 2021 06:30:36 GMT
Title: Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search
Authors: Houwen Peng, Hao Du, Hongyuan Yu, Qi Li, Jing Liao, Jianlong Fu
Abstract summary: One-shot weight sharing methods have recently drawn great attention in neural architecture search due to high efficiency and competitive performance. To alleviate this problem, we present a simple yet effective architecture distillation method. We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training. Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop.
Score: 60.965024145243596
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One-shot weight sharing methods have recently drawn great attention in neural architecture search due to high efficiency and competitive performance. However, weight sharing across models has an inherent deficiency, i.e., insufficient training of subnetworks in hypernetworks. To alleviate this problem, we present a simple yet effective architecture distillation method. The central idea is that subnetworks can learn collaboratively and teach each other throughout the training process, aiming to boost the convergence of individual models. We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training. Distilling knowledge from the prioritized paths is able to boost the training of subnetworks. Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop. We directly select the most promising one from the prioritized paths as the final architecture, without using other complex search methods, such as reinforcement learning or evolution algorithms. The experiments on ImageNet verify such path distillation method can improve the convergence ratio and performance of the hypernetwork, as well as boosting the training of subnetworks. The discovered architectures achieve superior performance compared to the recent MobileNetV3 and EfficientNet families under aligned settings. Moreover, the experiments on object detection and more challenging search space show the generality and robustness of the proposed method. Code and models are available at https://github.com/microsoft/cream.git.

Related papers

FlowNAS: Neural Architecture Search for Optical Flow Estimation [65.44079917247369]
We propose a neural architecture search method named FlowNAS to automatically find the better encoder architecture for flow estimation task. Experimental results show that the discovered architecture with the weights inherited from the super-network achieves 4.67% F1-all error on KITTI.
arXiv Detail & Related papers (2022-07-04T09:05:25Z)
Pruning-as-Search: Efficient Neural Architecture Search via Channel Pruning and Structural Reparameterization [50.50023451369742]
Pruning-as-Search (PaS) is an end-to-end channel pruning method to search out desired sub-network automatically and efficiently. Our proposed architecture outperforms prior arts by around $1.0%$ top-1 accuracy on ImageNet-1000 classification task.
arXiv Detail & Related papers (2022-06-02T17:58:54Z)
Automatic Block-wise Pruning with Auxiliary Gating Structures for Deep Convolutional Neural Networks [9.293334856614628]
This paper presents a novel structured network pruning method with auxiliary gating structures. Our experiments demonstrate that our method can achieve state-of-the-arts compression performance for the classification tasks.
arXiv Detail & Related papers (2022-05-07T09:03:32Z)
DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation [99.88539409432916]
We study the unsupervised domain adaptation (UDA) process. We propose a novel UDA method, DAFormer, based on the benchmark results. DAFormer significantly improves the state-of-the-art performance by 10.8 mIoU for GTA->Cityscapes and 5.4 mIoU for Synthia->Cityscapes.
arXiv Detail & Related papers (2021-11-29T19:00:46Z)
Simultaneous Training of Partially Masked Neural Networks [67.19481956584465]
We show that it is possible to train neural networks in such a way that a predefined 'core' subnetwork can be split-off from the trained full network with remarkable good performance. We show that training a Transformer with a low-rank core gives a low-rank model with superior performance than when training the low-rank model alone.
arXiv Detail & Related papers (2021-06-16T15:57:51Z)
Distilling Knowledge via Knowledge Review [69.15050871776552]
We study the factor of connection path cross levels between teacher and student networks, and reveal its great importance. For the first time in knowledge distillation, cross-stage connection paths are proposed. Our finally designed nested and compact framework requires negligible overhead, and outperforms other methods on a variety of tasks.
arXiv Detail & Related papers (2021-04-19T04:36:24Z)
Task-Adaptive Neural Network Retrieval with Meta-Contrastive Learning [34.27089256930098]
We propose a novel neural network retrieval method, which retrieves the most optimal pre-trained network for a given task. We train this framework by meta-learning a cross-modal latent space with contrastive loss, to maximize the similarity between a dataset and a network. We validate the efficacy of our method on ten real-world datasets, against existing NAS baselines.
arXiv Detail & Related papers (2021-03-02T06:30:51Z)
Transfer Learning Between Different Architectures Via Weights Injection [0.0]
This work presents a naive algorithm for parameter transfer between different architectures with a computationally cheap injection technique. The primary objective is to speed up the training of neural networks from scratch.
arXiv Detail & Related papers (2021-01-07T20:42:35Z)
MixPath: A Unified Approach for One-shot Neural Architecture Search [13.223963114415552]
We propose a novel mechanism called Shadow Batch Normalization (SBN) to regularize the disparate feature statistics. We call our unified multi-path one-shot approach as MixPath, which generates a series of models that achieve state-of-the-art results on ImageNet.
arXiv Detail & Related papers (2020-01-16T15:24:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.