Related papers: Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift

Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift

URL: http://arxiv.org/abs/2108.09671v1
Date: Sun, 22 Aug 2021 09:08:48 GMT
Title: Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift
Authors: Jiefeng Peng, Jiqi Zhang, Changlin Li, Guangrun Wang, Xiaodan Liang, Liang Lin
Abstract summary: Recently proposed neural architecture search (NAS) methods co-train billions of architectures in a supernet and estimate their potential accuracy. The ranking correlation between the architectures' predicted accuracy and their actual capability is incorrect, which causes the existing NAS methods' dilemma. We attribute this ranking correlation problem to the supernet training consistency shift, including feature shift and parameter shift. We address these two shifts simultaneously using a nontrivial supernet-Pi model, called Pi-NAS.
Score: 128.32670289503025
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently proposed neural architecture search (NAS) methods co-train billions of architectures in a supernet and estimate their potential accuracy using the network weights detached from the supernet. However, the ranking correlation between the architectures' predicted accuracy and their actual capability is incorrect, which causes the existing NAS methods' dilemma. We attribute this ranking correlation problem to the supernet training consistency shift, including feature shift and parameter shift. Feature shift is identified as dynamic input distributions of a hidden layer due to random path sampling. The input distribution dynamic affects the loss descent and finally affects architecture ranking. Parameter shift is identified as contradictory parameter updates for a shared layer lay in different paths in different training steps. The rapidly-changing parameter could not preserve architecture ranking. We address these two shifts simultaneously using a nontrivial supernet-Pi model, called Pi-NAS. Specifically, we employ a supernet-Pi model that contains cross-path learning to reduce the feature consistency shift between different paths. Meanwhile, we adopt a novel nontrivial mean teacher containing negative samples to overcome parameter shift and model collision. Furthermore, our Pi-NAS runs in an unsupervised manner, which can search for more transferable architectures. Extensive experiments on ImageNet and a wide range of downstream tasks (e.g., COCO 2017, ADE20K, and Cityscapes) demonstrate the effectiveness and universality of our Pi-NAS compared to supervised NAS. See Codes: https://github.com/Ernie1/Pi-NAS.

Related papers

The devil is in discretization discrepancy. Robustifying Differentiable NAS with Single-Stage Searching Protocol [2.4300749758571905]
gradient-based methods suffer from the discretization error, which can severely damage the process of obtaining the final architecture. We introduce a novel single-stage searching protocol, which is not reliant on decoding a continuous architecture. Our results demonstrate that this approach outperforms other DNAS methods by achieving 75.3% in the searching stage on the Cityscapes validation dataset.
arXiv Detail & Related papers (2024-05-26T15:44:53Z)
ShiftNAS: Improving One-shot NAS via Probability Shift [1.3537414663819973]
We propose ShiftNAS, a method that can adjust the sampling probability based on the complexity of networks. We evaluate our approach on multiple visual network models, including convolutional neural networks (CNNs) and vision transformers (ViTs) Experimental results on ImageNet show that ShiftNAS can improve the performance of one-shot NAS without additional consumption.
arXiv Detail & Related papers (2023-07-17T07:53:23Z)
Neural Architecture Search via Two Constant Shared Weights Initialisations [0.0]
We present a zero-cost metric that highly correlated with the train set accuracy across the NAS-Bench-101, NAS-Bench-201 and NAS-Bench-NLP benchmark datasets. Our method is easy to integrate within existing NAS algorithms and takes a fraction of a second to evaluate a single network.
arXiv Detail & Related papers (2023-02-09T02:25:38Z)
NASiam: Efficient Representation Learning using Neural Architecture Search for Siamese Networks [76.8112416450677]
Siamese networks are one of the most trending methods to achieve self-supervised visual representation learning (SSL) NASiam is a novel approach that uses for the first time differentiable NAS to improve the multilayer perceptron projector and predictor (encoder/predictor pair) NASiam reaches competitive performance in both small-scale (i.e., CIFAR-10/CIFAR-100) and large-scale (i.e., ImageNet) image classification datasets while costing only a few GPU hours.
arXiv Detail & Related papers (2023-01-31T19:48:37Z)
DropNAS: Grouped Operation Dropout for Differentiable Architecture Search [78.06809383150437]
Recently, DARTS relaxes the search process with a differentiable formulation that leverages weight-sharing and SGD. This causes two problems: firstly, the operations with more parameters may never have the chance to express the desired function. We propose a novel grouped operation dropout algorithm named DropNAS to fix the problems with DARTS.
arXiv Detail & Related papers (2022-01-27T17:28:23Z)
Across-Task Neural Architecture Search via Meta Learning [1.225795556154044]
Adequate labeled data and expensive compute resources are the prerequisites for the success of neural architecture search(NAS) It is challenging to apply NAS in meta-learning scenarios with limited compute resources and data. In this paper, an across-task neural architecture search (AT-NAS) is proposed to address the problem through combining gradient-based meta-learning with EA-based NAS.
arXiv Detail & Related papers (2021-10-12T09:07:33Z)
L$^{2}$NAS: Learning to Optimize Neural Architectures via Continuous-Action Reinforcement Learning [23.25155249879658]
Differentiable architecture search (NAS) achieved remarkable results in deep neural network design. We show that L$2$ achieves state-of-theart results on DART201 benchmark as well as NASS and Once-for-All search policies.
arXiv Detail & Related papers (2021-09-25T19:26:30Z)
Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective [88.39981851247727]
We propose a novel framework called training-free neural architecture search (TE-NAS) TE-NAS ranks architectures by analyzing the spectrum of the neural tangent kernel (NTK) and the number of linear regions in the input space. We show that: (1) these two measurements imply the trainability and expressivity of a neural network; (2) they strongly correlate with the network's test accuracy.
arXiv Detail & Related papers (2021-02-23T07:50:44Z)
DSNAS: Direct Neural Architecture Search without Parameter Retraining [112.02966105995641]
We propose a new problem definition for NAS, task-specific end-to-end, based on this observation. We propose DSNAS, an efficient differentiable NAS framework that simultaneously optimize architecture and parameters with a low-biased Monte Carlo estimate. DSNAS successfully discovers networks with comparable accuracy (74.4%) on ImageNet in 420 GPU hours, reducing the total time by more than 34%.
arXiv Detail & Related papers (2020-02-21T04:41:47Z)
DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution Pruning [135.27931587381596]
We propose an efficient and unified NAS framework termed DDPNAS via dynamic distribution pruning. In particular, we first sample architectures from a joint categorical distribution. Then the search space is dynamically pruned and its distribution is updated every few epochs. With the proposed efficient network generation method, we directly obtain the optimal neural architectures on given constraints.
arXiv Detail & Related papers (2019-05-28T06:35:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.