Robustifying DARTS by Eliminating Information Bypass Leakage via
Explicit Sparse Regularization
- URL: http://arxiv.org/abs/2306.06858v1
- Date: Mon, 12 Jun 2023 04:11:37 GMT
- Title: Robustifying DARTS by Eliminating Information Bypass Leakage via
Explicit Sparse Regularization
- Authors: Jiuling Zhang, Zhiming Ding
- Abstract summary: Differentiable architecture search (DARTS) is a promising end to end NAS method.
Recent studies cast doubt on the basic underlying hypotheses of DARTS.
We propose a novel sparse-regularized approximation and an efficient mixed-sparsity training scheme to robustify DARTS.
- Score: 8.93957397187611
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Differentiable architecture search (DARTS) is a promising end to end NAS
method which directly optimizes the architecture parameters through general
gradient descent. However, DARTS is brittle to the catastrophic failure
incurred by the skip connection in the search space. Recent studies also cast
doubt on the basic underlying hypotheses of DARTS which are argued to be
inherently prone to the performance discrepancy between the continuous-relaxed
supernet in the training phase and the discretized finalnet in the evaluation
phase. We figure out that the robustness problem and the skepticism can both be
explained by the information bypass leakage during the training of the
supernet. This naturally highlights the vital role of the sparsity of
architecture parameters in the training phase which has not been well developed
in the past. We thus propose a novel sparse-regularized approximation and an
efficient mixed-sparsity training scheme to robustify DARTS by eliminating the
information bypass leakage. We subsequently conduct extensive experiments on
multiple search spaces to demonstrate the effectiveness of our method.
Related papers
- The devil is in discretization discrepancy. Robustifying Differentiable NAS with Single-Stage Searching Protocol [2.4300749758571905]
gradient-based methods suffer from the discretization error, which can severely damage the process of obtaining the final architecture.
We introduce a novel single-stage searching protocol, which is not reliant on decoding a continuous architecture.
Our results demonstrate that this approach outperforms other DNAS methods by achieving 75.3% in the searching stage on the Cityscapes validation dataset.
arXiv Detail & Related papers (2024-05-26T15:44:53Z) - IS-DARTS: Stabilizing DARTS through Precise Measurement on Candidate
Importance [41.23462863659102]
DARTS is known for its efficiency and simplicity.
However, performance collapse in DARTS results in deteriorating architectures filled with parameter-free operations.
We propose IS-DARTS to comprehensively improve DARTS and resolve the aforementioned problems.
arXiv Detail & Related papers (2023-12-19T22:45:57Z) - Lightweight Diffusion Models with Distillation-Based Block Neural
Architecture Search [55.41583104734349]
We propose to automatically remove structural redundancy in diffusion models with our proposed Diffusion Distillation-based Block-wise Neural Architecture Search (NAS)
Given a larger pretrained teacher, we leverage DiffNAS to search for the smallest architecture which can achieve on-par or even better performance than the teacher.
Different from previous block-wise NAS methods, DiffNAS contains a block-wise local search strategy and a retraining strategy with a joint dynamic loss.
arXiv Detail & Related papers (2023-11-08T12:56:59Z) - Magnitude Matters: Fixing SIGNSGD Through Magnitude-Aware Sparsification
in the Presence of Data Heterogeneity [60.791736094073]
Communication overhead has become one of the major bottlenecks in the distributed training of deep neural networks.
We propose a magnitude-driven sparsification scheme, which addresses the non-convergence issue of SIGNSGD.
The proposed scheme is validated through experiments on Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets.
arXiv Detail & Related papers (2023-02-19T17:42:35Z) - $\beta$-DARTS++: Bi-level Regularization for Proxy-robust Differentiable
Architecture Search [96.99525100285084]
Regularization method, Beta-Decay, is proposed to regularize the DARTS-based NAS searching process (i.e., $beta$-DARTS)
In-depth theoretical analyses on how it works and why it works are provided.
arXiv Detail & Related papers (2023-01-16T12:30:32Z) - Enhancing the Robustness, Efficiency, and Diversity of Differentiable
Architecture Search [25.112048502327738]
Differentiable architecture search (DARTS) has attracted much attention due to its simplicity and significant improvement in efficiency.
Many works attempt to restrict the accumulation of skip connections by indicators or manual design.
We suggest a more subtle and direct approach that removes skip connections from the operation space.
arXiv Detail & Related papers (2022-04-10T13:25:36Z) - $\beta$-DARTS: Beta-Decay Regularization for Differentiable Architecture
Search [85.84110365657455]
We propose a simple-but-efficient regularization method, termed as Beta-Decay, to regularize the DARTS-based NAS searching process.
Experimental results on NAS-Bench-201 show that our proposed method can help to stabilize the searching process and makes the searched network more transferable across different datasets.
arXiv Detail & Related papers (2022-03-03T11:47:14Z) - Connection Sensitivity Matters for Training-free DARTS: From
Architecture-Level Scoring to Operation-Level Sensitivity Analysis [32.94768616851585]
Recently proposed training-free NAS methods abandon the training phase and design various zero-cost proxies as scores to identify excellent architectures.
In this paper, we raise an interesting problem: can we properly measure the operation importance in DARTS through a training-free way, with avoiding the parameter-intensive bias?
By devising an iterative and data-agnostic manner in utilizing ZEROS for NAS, our novel trial leads to a framework called training free differentiable architecture search (FreeDARTS)
arXiv Detail & Related papers (2021-06-22T04:40:34Z) - DrNAS: Dirichlet Neural Architecture Search [88.56953713817545]
We treat the continuously relaxed architecture mixing weight as random variables, modeled by Dirichlet distribution.
With recently developed pathwise derivatives, the Dirichlet parameters can be easily optimized with gradient-based generalization.
To alleviate the large memory consumption of differentiable NAS, we propose a simple yet effective progressive learning scheme.
arXiv Detail & Related papers (2020-06-18T08:23:02Z) - Stabilizing Differentiable Architecture Search via Perturbation-based
Regularization [99.81980366552408]
We find that the precipitous validation loss landscape, which leads to a dramatic performance drop when distilling the final architecture, is an essential factor that causes instability.
We propose a perturbation-based regularization - SmoothDARTS (SDARTS) - to smooth the loss landscape and improve the generalizability of DARTS-based methods.
arXiv Detail & Related papers (2020-02-12T23:46:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.