DARTS without a Validation Set: Optimizing the Marginal Likelihood
- URL: http://arxiv.org/abs/2112.13023v1
- Date: Fri, 24 Dec 2021 10:16:38 GMT
- Title: DARTS without a Validation Set: Optimizing the Marginal Likelihood
- Authors: Miroslav Fil, Binxin Ru, Clare Lyle, Yarin Gal
- Abstract summary: Training-Speed-Estimate (TSE) has previously been used in place of the validation loss for gradient-based optimization in DARTS.
We extend those results by applying various DARTS diagnostics and show several unusual behaviors arising from not using a validation set.
- Score: 33.26229536690996
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success of neural architecture search (NAS) has historically been limited
by excessive compute requirements. While modern weight-sharing NAS methods such
as DARTS are able to finish the search in single-digit GPU days, extracting the
final best architecture from the shared weights is notoriously unreliable.
Training-Speed-Estimate (TSE), a recently developed generalization estimator
with a Bayesian marginal likelihood interpretation, has previously been used in
place of the validation loss for gradient-based optimization in DARTS. This
prevents the DARTS skip connection collapse, which significantly improves
performance on NASBench-201 and the original DARTS search space. We extend
those results by applying various DARTS diagnostics and show several unusual
behaviors arising from not using a validation set. Furthermore, our experiments
yield concrete examples of the depth gap and topology selection in DARTS having
a strongly negative impact on the search performance despite generally
receiving limited attention in the literature compared to the operations
selection.
Related papers
- OStr-DARTS: Differentiable Neural Architecture Search based on Operation Strength [70.76342136866413]
Differentiable architecture search (DARTS) has emerged as a promising technique for effective neural architecture search.
DARTS suffers from the well-known degeneration issue which can lead to deteriorating architectures.
We propose a novel criterion based on operation strength that estimates the importance of an operation by its effect on the final loss.
arXiv Detail & Related papers (2024-09-22T13:16:07Z) - Heterogeneous Learning Rate Scheduling for Neural Architecture Search on Long-Tailed Datasets [0.0]
We propose a novel adaptive learning rate scheduling strategy tailored for the architecture parameters of DARTS.
Our approach dynamically adjusts the learning rate of the architecture parameters based on the training epoch, preventing the disruption of well-trained representations.
arXiv Detail & Related papers (2024-06-11T07:32:25Z) - Simple Ingredients for Offline Reinforcement Learning [86.1988266277766]
offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task.
We show that existing methods struggle with diverse data: their performance considerably deteriorates as data collected for related but different tasks is simply added to the offline buffer.
We show that scale, more than algorithmic considerations, is the key factor influencing performance.
arXiv Detail & Related papers (2024-03-19T18:57:53Z) - Efficient Architecture Search via Bi-level Data Pruning [70.29970746807882]
This work pioneers an exploration into the critical role of dataset characteristics for DARTS bi-level optimization.
We introduce a new progressive data pruning strategy that utilizes supernet prediction dynamics as the metric.
Comprehensive evaluations on the NAS-Bench-201 search space, DARTS search space, and MobileNet-like search space validate that BDP reduces search costs by over 50%.
arXiv Detail & Related papers (2023-12-21T02:48:44Z) - IS-DARTS: Stabilizing DARTS through Precise Measurement on Candidate
Importance [41.23462863659102]
DARTS is known for its efficiency and simplicity.
However, performance collapse in DARTS results in deteriorating architectures filled with parameter-free operations.
We propose IS-DARTS to comprehensively improve DARTS and resolve the aforementioned problems.
arXiv Detail & Related papers (2023-12-19T22:45:57Z) - $\Lambda$-DARTS: Mitigating Performance Collapse by Harmonizing
Operation Selection among Cells [11.777101481512423]
Differentiable neural architecture search (DARTS) is a popular method for neural architecture search (NAS)
We show that DARTS suffers from a specific structural flaw due to its weight-sharing framework that limits the convergence of DARTS to saturation points of the softmax function.
We propose two new regularization terms that aim to prevent performance collapse by harmonizing operation selection via aligning gradients of layers.
arXiv Detail & Related papers (2022-10-14T17:54:01Z) - $\beta$-DARTS: Beta-Decay Regularization for Differentiable Architecture
Search [85.84110365657455]
We propose a simple-but-efficient regularization method, termed as Beta-Decay, to regularize the DARTS-based NAS searching process.
Experimental results on NAS-Bench-201 show that our proposed method can help to stabilize the searching process and makes the searched network more transferable across different datasets.
arXiv Detail & Related papers (2022-03-03T11:47:14Z) - ZARTS: On Zero-order Optimization for Neural Architecture Search [94.41017048659664]
Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency.
This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation.
In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue.
arXiv Detail & Related papers (2021-10-10T09:35:15Z) - iDARTS: Improving DARTS by Node Normalization and Decorrelation
Discretization [51.489024258966886]
Differentiable ARchiTecture Search (DARTS) uses a continuous relaxation of network representation and dramatically accelerates Neural Architecture Search (NAS) by almost thousands of times in GPU-day.
However, the searching process of DARTS is unstable, which suffers severe degradation when training epochs become large.
We propose an improved version of DARTS, namely iDARTS, to deal with the two problems.
arXiv Detail & Related papers (2021-08-25T02:23:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.