$\Lambda$-DARTS: Mitigating Performance Collapse by Harmonizing
Operation Selection among Cells
- URL: http://arxiv.org/abs/2210.07998v1
- Date: Fri, 14 Oct 2022 17:54:01 GMT
- Title: $\Lambda$-DARTS: Mitigating Performance Collapse by Harmonizing
Operation Selection among Cells
- Authors: Sajad Movahedi, Melika Adabinejad, Ayyoob Imani, Arezou Keshavarz,
Mostafa Dehghani, Azadeh Shakery, Babak N. Araabi
- Abstract summary: Differentiable neural architecture search (DARTS) is a popular method for neural architecture search (NAS)
We show that DARTS suffers from a specific structural flaw due to its weight-sharing framework that limits the convergence of DARTS to saturation points of the softmax function.
We propose two new regularization terms that aim to prevent performance collapse by harmonizing operation selection via aligning gradients of layers.
- Score: 11.777101481512423
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Differentiable neural architecture search (DARTS) is a popular method for
neural architecture search (NAS), which performs cell-search and utilizes
continuous relaxation to improve the search efficiency via gradient-based
optimization. The main shortcoming of DARTS is performance collapse, where the
discovered architecture suffers from a pattern of declining quality during
search. Performance collapse has become an important topic of research, with
many methods trying to solve the issue through either regularization or
fundamental changes to DARTS. However, the weight-sharing framework used for
cell-search in DARTS and the convergence of architecture parameters has not
been analyzed yet. In this paper, we provide a thorough and novel theoretical
and empirical analysis on DARTS and its point of convergence. We show that
DARTS suffers from a specific structural flaw due to its weight-sharing
framework that limits the convergence of DARTS to saturation points of the
softmax function. This point of convergence gives an unfair advantage to layers
closer to the output in choosing the optimal architecture, causing performance
collapse. We then propose two new regularization terms that aim to prevent
performance collapse by harmonizing operation selection via aligning gradients
of layers. Experimental results on six different search spaces and three
different datasets show that our method ($\Lambda$-DARTS) does indeed prevent
performance collapse, providing justification for our theoretical analysis and
the proposed remedy.
Related papers
- OStr-DARTS: Differentiable Neural Architecture Search based on Operation Strength [70.76342136866413]
Differentiable architecture search (DARTS) has emerged as a promising technique for effective neural architecture search.
DARTS suffers from the well-known degeneration issue which can lead to deteriorating architectures.
We propose a novel criterion based on operation strength that estimates the importance of an operation by its effect on the final loss.
arXiv Detail & Related papers (2024-09-22T13:16:07Z) - $\beta$-DARTS: Beta-Decay Regularization for Differentiable Architecture
Search [85.84110365657455]
We propose a simple-but-efficient regularization method, termed as Beta-Decay, to regularize the DARTS-based NAS searching process.
Experimental results on NAS-Bench-201 show that our proposed method can help to stabilize the searching process and makes the searched network more transferable across different datasets.
arXiv Detail & Related papers (2022-03-03T11:47:14Z) - ZARTS: On Zero-order Optimization for Neural Architecture Search [94.41017048659664]
Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency.
This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation.
In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue.
arXiv Detail & Related papers (2021-10-10T09:35:15Z) - DARTS for Inverse Problems: a Study on Hyperparameter Sensitivity [21.263326724329698]
Differentiable architecture search (DARTS) is a widely researched tool for neural architecture search.
We report the results of any DARTS-based methods from several runs along with its underlying performance statistics.
arXiv Detail & Related papers (2021-08-12T10:28:02Z) - iDARTS: Differentiable Architecture Search with Stochastic Implicit
Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS)
We tackle the hypergradient computation in DARTS based on the implicit function theorem.
We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z) - Making Differentiable Architecture Search less local [9.869449181400466]
Differentiable neural architecture search (DARTS) is a promising NAS approach that dramatically increases search efficiency.
It has been shown to suffer from performance collapse, where the search often leads to detrimental architectures.
We develop a more global optimisation scheme that is able to better explore the space without changing the DARTS problem formulation.
arXiv Detail & Related papers (2021-04-21T10:36:43Z) - RARTS: An Efficient First-Order Relaxed Architecture Search Method [5.491655566898372]
Differentiable architecture search (DARTS) is an effective method for data-driven neural network design based on solving a bilevel optimization problem.
We formulate a single level alternative and a relaxed architecture search (RARTS) method that utilizes the whole dataset in architecture learning via both data and network splitting.
For the task of searching topological architecture, i.e., the edges and the operations, RARTS obtains a higher accuracy and 60% reduction of computational cost than second-order DARTS on CIFAR-10.
arXiv Detail & Related papers (2020-08-10T04:55:51Z) - DrNAS: Dirichlet Neural Architecture Search [88.56953713817545]
We treat the continuously relaxed architecture mixing weight as random variables, modeled by Dirichlet distribution.
With recently developed pathwise derivatives, the Dirichlet parameters can be easily optimized with gradient-based generalization.
To alleviate the large memory consumption of differentiable NAS, we propose a simple yet effective progressive learning scheme.
arXiv Detail & Related papers (2020-06-18T08:23:02Z) - Stabilizing Differentiable Architecture Search via Perturbation-based
Regularization [99.81980366552408]
We find that the precipitous validation loss landscape, which leads to a dramatic performance drop when distilling the final architecture, is an essential factor that causes instability.
We propose a perturbation-based regularization - SmoothDARTS (SDARTS) - to smooth the loss landscape and improve the generalizability of DARTS-based methods.
arXiv Detail & Related papers (2020-02-12T23:46:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.