iDARTS: Improving DARTS by Node Normalization and Decorrelation
Discretization
- URL: http://arxiv.org/abs/2108.11014v1
- Date: Wed, 25 Aug 2021 02:23:30 GMT
- Title: iDARTS: Improving DARTS by Node Normalization and Decorrelation
Discretization
- Authors: Huiqun Wang, Ruijie Yang, Di Huang and Yunhong Wang
- Abstract summary: Differentiable ARchiTecture Search (DARTS) uses a continuous relaxation of network representation and dramatically accelerates Neural Architecture Search (NAS) by almost thousands of times in GPU-day.
However, the searching process of DARTS is unstable, which suffers severe degradation when training epochs become large.
We propose an improved version of DARTS, namely iDARTS, to deal with the two problems.
- Score: 51.489024258966886
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Differentiable ARchiTecture Search (DARTS) uses a continuous relaxation of
network representation and dramatically accelerates Neural Architecture Search
(NAS) by almost thousands of times in GPU-day. However, the searching process
of DARTS is unstable, which suffers severe degradation when training epochs
become large, thus limiting its application. In this paper, we claim that this
degradation issue is caused by the imbalanced norms between different nodes and
the highly correlated outputs from various operations. We then propose an
improved version of DARTS, namely iDARTS, to deal with the two problems. In the
training phase, it introduces node normalization to maintain the norm balance.
In the discretization phase, the continuous architecture is approximated based
on the similarity between the outputs of the node and the decorrelated
operations rather than the values of the architecture parameters. Extensive
evaluation is conducted on CIFAR-10 and ImageNet, and the error rates of 2.25\%
and 24.7\% are reported within 0.2 and 1.9 GPU-day for architecture search
respectively, which shows its effectiveness. Additional analysis also reveals
that iDARTS has the advantage in robustness and generalization over other
DARTS-based counterparts.
Related papers
- Heterogeneous Learning Rate Scheduling for Neural Architecture Search on Long-Tailed Datasets [0.0]
We propose a novel adaptive learning rate scheduling strategy tailored for the architecture parameters of DARTS.
Our approach dynamically adjusts the learning rate of the architecture parameters based on the training epoch, preventing the disruption of well-trained representations.
arXiv Detail & Related papers (2024-06-11T07:32:25Z) - $\Lambda$-DARTS: Mitigating Performance Collapse by Harmonizing
Operation Selection among Cells [11.777101481512423]
Differentiable neural architecture search (DARTS) is a popular method for neural architecture search (NAS)
We show that DARTS suffers from a specific structural flaw due to its weight-sharing framework that limits the convergence of DARTS to saturation points of the softmax function.
We propose two new regularization terms that aim to prevent performance collapse by harmonizing operation selection via aligning gradients of layers.
arXiv Detail & Related papers (2022-10-14T17:54:01Z) - ZARTS: On Zero-order Optimization for Neural Architecture Search [94.41017048659664]
Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency.
This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation.
In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue.
arXiv Detail & Related papers (2021-10-10T09:35:15Z) - MS-DARTS: Mean-Shift Based Differentiable Architecture Search [11.115656548869199]
We propose a Mean-Shift based DARTS (MS-DARTS) to improve stability based on sampling and perturbation.
MS-DARTS archives higher performance over other state-of-the-art NAS methods with reduced search cost.
arXiv Detail & Related papers (2021-08-23T08:06:45Z) - D-DARTS: Distributed Differentiable Architecture Search [75.12821786565318]
Differentiable ARchiTecture Search (DARTS) is one of the most trending Neural Architecture Search (NAS) methods.
We propose D-DARTS, a novel solution that addresses this problem by nesting several neural networks at cell-level.
arXiv Detail & Related papers (2021-08-20T09:07:01Z) - iDARTS: Differentiable Architecture Search with Stochastic Implicit
Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS)
We tackle the hypergradient computation in DARTS based on the implicit function theorem.
We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z) - RARTS: An Efficient First-Order Relaxed Architecture Search Method [5.491655566898372]
Differentiable architecture search (DARTS) is an effective method for data-driven neural network design based on solving a bilevel optimization problem.
We formulate a single level alternative and a relaxed architecture search (RARTS) method that utilizes the whole dataset in architecture learning via both data and network splitting.
For the task of searching topological architecture, i.e., the edges and the operations, RARTS obtains a higher accuracy and 60% reduction of computational cost than second-order DARTS on CIFAR-10.
arXiv Detail & Related papers (2020-08-10T04:55:51Z) - Stabilizing Differentiable Architecture Search via Perturbation-based
Regularization [99.81980366552408]
We find that the precipitous validation loss landscape, which leads to a dramatic performance drop when distilling the final architecture, is an essential factor that causes instability.
We propose a perturbation-based regularization - SmoothDARTS (SDARTS) - to smooth the loss landscape and improve the generalizability of DARTS-based methods.
arXiv Detail & Related papers (2020-02-12T23:46:58Z) - Simple and Effective Prevention of Mode Collapse in Deep One-Class
Classification [93.2334223970488]
We propose two regularizers to prevent hypersphere collapse in deep SVDD.
The first regularizer is based on injecting random noise via the standard cross-entropy loss.
The second regularizer penalizes the minibatch variance when it becomes too small.
arXiv Detail & Related papers (2020-01-24T03:44:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.