Stabilizing Differentiable Architecture Search via Perturbation-based
Regularization
- URL: http://arxiv.org/abs/2002.05283v3
- Date: Tue, 12 Jan 2021 19:17:24 GMT
- Title: Stabilizing Differentiable Architecture Search via Perturbation-based
Regularization
- Authors: Xiangning Chen, Cho-Jui Hsieh
- Abstract summary: We find that the precipitous validation loss landscape, which leads to a dramatic performance drop when distilling the final architecture, is an essential factor that causes instability.
We propose a perturbation-based regularization - SmoothDARTS (SDARTS) - to smooth the loss landscape and improve the generalizability of DARTS-based methods.
- Score: 99.81980366552408
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Differentiable architecture search (DARTS) is a prevailing NAS solution to
identify architectures. Based on the continuous relaxation of the architecture
space, DARTS learns a differentiable architecture weight and largely reduces
the search cost. However, its stability has been challenged for yielding
deteriorating architectures as the search proceeds. We find that the
precipitous validation loss landscape, which leads to a dramatic performance
drop when distilling the final architecture, is an essential factor that causes
instability. Based on this observation, we propose a perturbation-based
regularization - SmoothDARTS (SDARTS), to smooth the loss landscape and improve
the generalizability of DARTS-based methods. In particular, our new
formulations stabilize DARTS-based methods by either random smoothing or
adversarial attack. The search trajectory on NAS-Bench-1Shot1 demonstrates the
effectiveness of our approach and due to the improved stability, we achieve
performance gain across various search spaces on 4 datasets. Furthermore, we
mathematically show that SDARTS implicitly regularizes the Hessian norm of the
validation loss, which accounts for a smoother loss landscape and improved
performance.
Related papers
- OStr-DARTS: Differentiable Neural Architecture Search based on Operation Strength [70.76342136866413]
Differentiable architecture search (DARTS) has emerged as a promising technique for effective neural architecture search.
DARTS suffers from the well-known degeneration issue which can lead to deteriorating architectures.
We propose a novel criterion based on operation strength that estimates the importance of an operation by its effect on the final loss.
arXiv Detail & Related papers (2024-09-22T13:16:07Z) - $\Lambda$-DARTS: Mitigating Performance Collapse by Harmonizing
Operation Selection among Cells [11.777101481512423]
Differentiable neural architecture search (DARTS) is a popular method for neural architecture search (NAS)
We show that DARTS suffers from a specific structural flaw due to its weight-sharing framework that limits the convergence of DARTS to saturation points of the softmax function.
We propose two new regularization terms that aim to prevent performance collapse by harmonizing operation selection via aligning gradients of layers.
arXiv Detail & Related papers (2022-10-14T17:54:01Z) - Enhancing the Robustness, Efficiency, and Diversity of Differentiable
Architecture Search [25.112048502327738]
Differentiable architecture search (DARTS) has attracted much attention due to its simplicity and significant improvement in efficiency.
Many works attempt to restrict the accumulation of skip connections by indicators or manual design.
We suggest a more subtle and direct approach that removes skip connections from the operation space.
arXiv Detail & Related papers (2022-04-10T13:25:36Z) - $\beta$-DARTS: Beta-Decay Regularization for Differentiable Architecture
Search [85.84110365657455]
We propose a simple-but-efficient regularization method, termed as Beta-Decay, to regularize the DARTS-based NAS searching process.
Experimental results on NAS-Bench-201 show that our proposed method can help to stabilize the searching process and makes the searched network more transferable across different datasets.
arXiv Detail & Related papers (2022-03-03T11:47:14Z) - iDARTS: Improving DARTS by Node Normalization and Decorrelation
Discretization [51.489024258966886]
Differentiable ARchiTecture Search (DARTS) uses a continuous relaxation of network representation and dramatically accelerates Neural Architecture Search (NAS) by almost thousands of times in GPU-day.
However, the searching process of DARTS is unstable, which suffers severe degradation when training epochs become large.
We propose an improved version of DARTS, namely iDARTS, to deal with the two problems.
arXiv Detail & Related papers (2021-08-25T02:23:30Z) - MS-DARTS: Mean-Shift Based Differentiable Architecture Search [11.115656548869199]
We propose a Mean-Shift based DARTS (MS-DARTS) to improve stability based on sampling and perturbation.
MS-DARTS archives higher performance over other state-of-the-art NAS methods with reduced search cost.
arXiv Detail & Related papers (2021-08-23T08:06:45Z) - $\mu$DARTS: Model Uncertainty-Aware Differentiable Architecture Search [8.024434062411943]
We introduce concrete dropout within DARTS cells and include a Monte-Carlo regularizer within the training loss to optimize the concrete dropout probabilities.
Experiments on CIFAR10, CIFAR100, SVHN, and ImageNet verify the effectiveness of $mu$DARTS in improving accuracy and reducing uncertainty.
arXiv Detail & Related papers (2021-07-24T01:09:20Z) - iDARTS: Differentiable Architecture Search with Stochastic Implicit
Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS)
We tackle the hypergradient computation in DARTS based on the implicit function theorem.
We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z) - DrNAS: Dirichlet Neural Architecture Search [88.56953713817545]
We treat the continuously relaxed architecture mixing weight as random variables, modeled by Dirichlet distribution.
With recently developed pathwise derivatives, the Dirichlet parameters can be easily optimized with gradient-based generalization.
To alleviate the large memory consumption of differentiable NAS, we propose a simple yet effective progressive learning scheme.
arXiv Detail & Related papers (2020-06-18T08:23:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.