Disturbance-immune Weight Sharing for Neural Architecture Search
- URL: http://arxiv.org/abs/2003.13089v1
- Date: Sun, 29 Mar 2020 17:54:49 GMT
- Title: Disturbance-immune Weight Sharing for Neural Architecture Search
- Authors: Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Yong Guo, Peilin Zhao,
Junzhou Huang, Mingkui Tan
- Abstract summary: We propose a disturbance-immune update strategy for model updating.
We theoretically analyze the effectiveness of our strategy in alleviating the performance disturbance risk.
- Score: 96.93812980299428
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural architecture search (NAS) has gained increasing attention in the
community of architecture design. One of the key factors behind the success
lies in the training efficiency created by the weight sharing (WS) technique.
However, WS-based NAS methods often suffer from a performance disturbance (PD)
issue. That is, the training of subsequent architectures inevitably disturbs
the performance of previously trained architectures due to the partially shared
weights. This leads to inaccurate performance estimation for the previous
architectures, which makes it hard to learn a good search strategy. To
alleviate the performance disturbance issue, we propose a new
disturbance-immune update strategy for model updating. Specifically, to
preserve the knowledge learned by previous architectures, we constrain the
training of subsequent architectures in an orthogonal space via orthogonal
gradient descent. Equipped with this strategy, we propose a novel
disturbance-immune training scheme for NAS. We theoretically analyze the
effectiveness of our strategy in alleviating the PD risk. Extensive experiments
on CIFAR-10 and ImageNet verify the superiority of our method.
Related papers
- The devil is in discretization discrepancy. Robustifying Differentiable NAS with Single-Stage Searching Protocol [2.4300749758571905]
gradient-based methods suffer from the discretization error, which can severely damage the process of obtaining the final architecture.
We introduce a novel single-stage searching protocol, which is not reliant on decoding a continuous architecture.
Our results demonstrate that this approach outperforms other DNAS methods by achieving 75.3% in the searching stage on the Cityscapes validation dataset.
arXiv Detail & Related papers (2024-05-26T15:44:53Z) - Lightweight Diffusion Models with Distillation-Based Block Neural
Architecture Search [55.41583104734349]
We propose to automatically remove structural redundancy in diffusion models with our proposed Diffusion Distillation-based Block-wise Neural Architecture Search (NAS)
Given a larger pretrained teacher, we leverage DiffNAS to search for the smallest architecture which can achieve on-par or even better performance than the teacher.
Different from previous block-wise NAS methods, DiffNAS contains a block-wise local search strategy and a retraining strategy with a joint dynamic loss.
arXiv Detail & Related papers (2023-11-08T12:56:59Z) - Proxyless Neural Architecture Adaptation for Supervised Learning and
Self-Supervised Learning [3.766702945560518]
We propose proxyless neural architecture adaptation that is reproducible and efficient.
Our method can be applied to both supervised learning and self-supervised learning.
arXiv Detail & Related papers (2022-05-15T02:49:48Z) - Neural Architecture Search for Speech Emotion Recognition [72.1966266171951]
We propose to apply neural architecture search (NAS) techniques to automatically configure the SER models.
We show that NAS can improve SER performance (54.89% to 56.28%) while maintaining model parameter sizes.
arXiv Detail & Related papers (2022-03-31T10:16:10Z) - RANK-NOSH: Efficient Predictor-Based Architecture Search via Non-Uniform
Successive Halving [74.61723678821049]
We propose NOn-uniform Successive Halving (NOSH), a hierarchical scheduling algorithm that terminates the training of underperforming architectures early to avoid wasting budget.
We formulate predictor-based architecture search as learning to rank with pairwise comparisons.
The resulting method - RANK-NOSH, reduces the search budget by 5x while achieving competitive or even better performance than previous state-of-the-art predictor-based methods on various spaces and datasets.
arXiv Detail & Related papers (2021-08-18T07:45:21Z) - The Nonlinearity Coefficient -- A Practical Guide to Neural Architecture
Design [3.04585143845864]
We develop methods that can predict, without any training, whether an architecture will achieve a relatively high test or training error on a task after training.
We then go on to explain the error in terms of the architecture definition itself and develop tools for modifying the architecture.
Our first major contribution is to show that the 'degree of nonlinearity' of a neural architecture is a key causal driver behind its performance.
arXiv Detail & Related papers (2021-05-25T20:47:43Z) - Contrastive Neural Architecture Search with Neural Architecture
Comparators [46.45102111497492]
One of the key steps in Neural Architecture Search (NAS) is to estimate the performance of candidate architectures.
Existing methods either directly use the validation performance or learn a predictor to estimate the performance.
We propose a novel Contrastive Neural Architecture Search (CTNAS) method which performs architecture search by taking the comparison results between architectures as the reward.
arXiv Detail & Related papers (2021-03-08T11:24:07Z) - On Adversarial Robustness: A Neural Architecture Search perspective [20.478741635006113]
This work is the first large-scale study to understand adversarial robustness purely from an architectural perspective.
We show that random sampling in the search space of DARTS with simple ensembling can improve the robustness to PGD attack by nearly12%.
We show that NAS, which is popular for achieving SoTA accuracy, can provide adversarial accuracy as a free add-on without any form of adversarial training.
arXiv Detail & Related papers (2020-07-16T16:07:10Z) - Multi-fidelity Neural Architecture Search with Knowledge Distillation [69.09782590880367]
We propose a bayesian multi-fidelity method for neural architecture search: MF-KD.
Knowledge distillation adds to a loss function a term forcing a network to mimic some teacher network.
We show that training for a few epochs with such a modified loss function leads to a better selection of neural architectures than training for a few epochs with a logistic loss.
arXiv Detail & Related papers (2020-06-15T12:32:38Z) - Stabilizing Differentiable Architecture Search via Perturbation-based
Regularization [99.81980366552408]
We find that the precipitous validation loss landscape, which leads to a dramatic performance drop when distilling the final architecture, is an essential factor that causes instability.
We propose a perturbation-based regularization - SmoothDARTS (SDARTS) - to smooth the loss landscape and improve the generalizability of DARTS-based methods.
arXiv Detail & Related papers (2020-02-12T23:46:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.