Pi-NAS: Improving Neural Architecture Search by Reducing Supernet
Training Consistency Shift
- URL: http://arxiv.org/abs/2108.09671v1
- Date: Sun, 22 Aug 2021 09:08:48 GMT
- Title: Pi-NAS: Improving Neural Architecture Search by Reducing Supernet
Training Consistency Shift
- Authors: Jiefeng Peng, Jiqi Zhang, Changlin Li, Guangrun Wang, Xiaodan Liang,
Liang Lin
- Abstract summary: Recently proposed neural architecture search (NAS) methods co-train billions of architectures in a supernet and estimate their potential accuracy.
The ranking correlation between the architectures' predicted accuracy and their actual capability is incorrect, which causes the existing NAS methods' dilemma.
We attribute this ranking correlation problem to the supernet training consistency shift, including feature shift and parameter shift.
We address these two shifts simultaneously using a nontrivial supernet-Pi model, called Pi-NAS.
- Score: 128.32670289503025
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently proposed neural architecture search (NAS) methods co-train billions
of architectures in a supernet and estimate their potential accuracy using the
network weights detached from the supernet. However, the ranking correlation
between the architectures' predicted accuracy and their actual capability is
incorrect, which causes the existing NAS methods' dilemma. We attribute this
ranking correlation problem to the supernet training consistency shift,
including feature shift and parameter shift. Feature shift is identified as
dynamic input distributions of a hidden layer due to random path sampling. The
input distribution dynamic affects the loss descent and finally affects
architecture ranking. Parameter shift is identified as contradictory parameter
updates for a shared layer lay in different paths in different training steps.
The rapidly-changing parameter could not preserve architecture ranking. We
address these two shifts simultaneously using a nontrivial supernet-Pi model,
called Pi-NAS. Specifically, we employ a supernet-Pi model that contains
cross-path learning to reduce the feature consistency shift between different
paths. Meanwhile, we adopt a novel nontrivial mean teacher containing negative
samples to overcome parameter shift and model collision. Furthermore, our
Pi-NAS runs in an unsupervised manner, which can search for more transferable
architectures. Extensive experiments on ImageNet and a wide range of downstream
tasks (e.g., COCO 2017, ADE20K, and Cityscapes) demonstrate the effectiveness
and universality of our Pi-NAS compared to supervised NAS. See Codes:
https://github.com/Ernie1/Pi-NAS.
Related papers
- The devil is in discretization discrepancy. Robustifying Differentiable NAS with Single-Stage Searching Protocol [2.4300749758571905]
gradient-based methods suffer from the discretization error, which can severely damage the process of obtaining the final architecture.
We introduce a novel single-stage searching protocol, which is not reliant on decoding a continuous architecture.
Our results demonstrate that this approach outperforms other DNAS methods by achieving 75.3% in the searching stage on the Cityscapes validation dataset.
arXiv Detail & Related papers (2024-05-26T15:44:53Z) - ShiftNAS: Improving One-shot NAS via Probability Shift [1.3537414663819973]
We propose ShiftNAS, a method that can adjust the sampling probability based on the complexity of networks.
We evaluate our approach on multiple visual network models, including convolutional neural networks (CNNs) and vision transformers (ViTs)
Experimental results on ImageNet show that ShiftNAS can improve the performance of one-shot NAS without additional consumption.
arXiv Detail & Related papers (2023-07-17T07:53:23Z) - Neural Architecture Search via Two Constant Shared Weights Initialisations [0.0]
We present a zero-cost metric that highly correlated with the train set accuracy across the NAS-Bench-101, NAS-Bench-201 and NAS-Bench-NLP benchmark datasets.
Our method is easy to integrate within existing NAS algorithms and takes a fraction of a second to evaluate a single network.
arXiv Detail & Related papers (2023-02-09T02:25:38Z) - NASiam: Efficient Representation Learning using Neural Architecture
Search for Siamese Networks [76.8112416450677]
Siamese networks are one of the most trending methods to achieve self-supervised visual representation learning (SSL)
NASiam is a novel approach that uses for the first time differentiable NAS to improve the multilayer perceptron projector and predictor (encoder/predictor pair)
NASiam reaches competitive performance in both small-scale (i.e., CIFAR-10/CIFAR-100) and large-scale (i.e., ImageNet) image classification datasets while costing only a few GPU hours.
arXiv Detail & Related papers (2023-01-31T19:48:37Z) - DropNAS: Grouped Operation Dropout for Differentiable Architecture
Search [78.06809383150437]
Recently, DARTS relaxes the search process with a differentiable formulation that leverages weight-sharing and SGD.
This causes two problems: firstly, the operations with more parameters may never have the chance to express the desired function.
We propose a novel grouped operation dropout algorithm named DropNAS to fix the problems with DARTS.
arXiv Detail & Related papers (2022-01-27T17:28:23Z) - Across-Task Neural Architecture Search via Meta Learning [1.225795556154044]
Adequate labeled data and expensive compute resources are the prerequisites for the success of neural architecture search(NAS)
It is challenging to apply NAS in meta-learning scenarios with limited compute resources and data.
In this paper, an across-task neural architecture search (AT-NAS) is proposed to address the problem through combining gradient-based meta-learning with EA-based NAS.
arXiv Detail & Related papers (2021-10-12T09:07:33Z) - L$^{2}$NAS: Learning to Optimize Neural Architectures via
Continuous-Action Reinforcement Learning [23.25155249879658]
Differentiable architecture search (NAS) achieved remarkable results in deep neural network design.
We show that L$2$ achieves state-of-theart results on DART201 benchmark as well as NASS and Once-for-All search policies.
arXiv Detail & Related papers (2021-09-25T19:26:30Z) - Neural Architecture Search on ImageNet in Four GPU Hours: A
Theoretically Inspired Perspective [88.39981851247727]
We propose a novel framework called training-free neural architecture search (TE-NAS)
TE-NAS ranks architectures by analyzing the spectrum of the neural tangent kernel (NTK) and the number of linear regions in the input space.
We show that: (1) these two measurements imply the trainability and expressivity of a neural network; (2) they strongly correlate with the network's test accuracy.
arXiv Detail & Related papers (2021-02-23T07:50:44Z) - DSNAS: Direct Neural Architecture Search without Parameter Retraining [112.02966105995641]
We propose a new problem definition for NAS, task-specific end-to-end, based on this observation.
We propose DSNAS, an efficient differentiable NAS framework that simultaneously optimize architecture and parameters with a low-biased Monte Carlo estimate.
DSNAS successfully discovers networks with comparable accuracy (74.4%) on ImageNet in 420 GPU hours, reducing the total time by more than 34%.
arXiv Detail & Related papers (2020-02-21T04:41:47Z) - DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution
Pruning [135.27931587381596]
We propose an efficient and unified NAS framework termed DDPNAS via dynamic distribution pruning.
In particular, we first sample architectures from a joint categorical distribution. Then the search space is dynamically pruned and its distribution is updated every few epochs.
With the proposed efficient network generation method, we directly obtain the optimal neural architectures on given constraints.
arXiv Detail & Related papers (2019-05-28T06:35:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.