Related papers: Robustifying DARTS by Eliminating Information Bypass Leakage via Explicit Sparse Regularization

Robustifying DARTS by Eliminating Information Bypass Leakage via Explicit Sparse Regularization

URL: http://arxiv.org/abs/2306.06858v1
Date: Mon, 12 Jun 2023 04:11:37 GMT
Title: Robustifying DARTS by Eliminating Information Bypass Leakage via Explicit Sparse Regularization
Authors: Jiuling Zhang, Zhiming Ding
Abstract summary: Differentiable architecture search (DARTS) is a promising end to end NAS method. Recent studies cast doubt on the basic underlying hypotheses of DARTS. We propose a novel sparse-regularized approximation and an efficient mixed-sparsity training scheme to robustify DARTS.
Score: 8.93957397187611
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Differentiable architecture search (DARTS) is a promising end to end NAS method which directly optimizes the architecture parameters through general gradient descent. However, DARTS is brittle to the catastrophic failure incurred by the skip connection in the search space. Recent studies also cast doubt on the basic underlying hypotheses of DARTS which are argued to be inherently prone to the performance discrepancy between the continuous-relaxed supernet in the training phase and the discretized finalnet in the evaluation phase. We figure out that the robustness problem and the skepticism can both be explained by the information bypass leakage during the training of the supernet. This naturally highlights the vital role of the sparsity of architecture parameters in the training phase which has not been well developed in the past. We thus propose a novel sparse-regularized approximation and an efficient mixed-sparsity training scheme to robustify DARTS by eliminating the information bypass leakage. We subsequently conduct extensive experiments on multiple search spaces to demonstrate the effectiveness of our method.

Related papers

The devil is in discretization discrepancy. Robustifying Differentiable NAS with Single-Stage Searching Protocol [2.4300749758571905]
gradient-based methods suffer from the discretization error, which can severely damage the process of obtaining the final architecture. We introduce a novel single-stage searching protocol, which is not reliant on decoding a continuous architecture. Our results demonstrate that this approach outperforms other DNAS methods by achieving 75.3% in the searching stage on the Cityscapes validation dataset.
arXiv Detail & Related papers (2024-05-26T15:44:53Z)
IS-DARTS: Stabilizing DARTS through Precise Measurement on Candidate Importance [41.23462863659102]
DARTS is known for its efficiency and simplicity. However, performance collapse in DARTS results in deteriorating architectures filled with parameter-free operations. We propose IS-DARTS to comprehensively improve DARTS and resolve the aforementioned problems.
arXiv Detail & Related papers (2023-12-19T22:45:57Z)
Lightweight Diffusion Models with Distillation-Based Block Neural Architecture Search [55.41583104734349]
We propose to automatically remove structural redundancy in diffusion models with our proposed Diffusion Distillation-based Block-wise Neural Architecture Search (NAS) Given a larger pretrained teacher, we leverage DiffNAS to search for the smallest architecture which can achieve on-par or even better performance than the teacher. Different from previous block-wise NAS methods, DiffNAS contains a block-wise local search strategy and a retraining strategy with a joint dynamic loss.
arXiv Detail & Related papers (2023-11-08T12:56:59Z)
Magnitude Matters: Fixing SIGNSGD Through Magnitude-Aware Sparsification in the Presence of Data Heterogeneity [60.791736094073]
Communication overhead has become one of the major bottlenecks in the distributed training of deep neural networks. We propose a magnitude-driven sparsification scheme, which addresses the non-convergence issue of SIGNSGD. The proposed scheme is validated through experiments on Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets.
arXiv Detail & Related papers (2023-02-19T17:42:35Z)
$\beta$-DARTS++: Bi-level Regularization for Proxy-robust Differentiable Architecture Search [96.99525100285084]
Regularization method, Beta-Decay, is proposed to regularize the DARTS-based NAS searching process (i.e., $beta$-DARTS) In-depth theoretical analyses on how it works and why it works are provided.
arXiv Detail & Related papers (2023-01-16T12:30:32Z)
Enhancing the Robustness, Efficiency, and Diversity of Differentiable Architecture Search [25.112048502327738]
Differentiable architecture search (DARTS) has attracted much attention due to its simplicity and significant improvement in efficiency. Many works attempt to restrict the accumulation of skip connections by indicators or manual design. We suggest a more subtle and direct approach that removes skip connections from the operation space.
arXiv Detail & Related papers (2022-04-10T13:25:36Z)
$\beta$-DARTS: Beta-Decay Regularization for Differentiable Architecture Search [85.84110365657455]
We propose a simple-but-efficient regularization method, termed as Beta-Decay, to regularize the DARTS-based NAS searching process. Experimental results on NAS-Bench-201 show that our proposed method can help to stabilize the searching process and makes the searched network more transferable across different datasets.
arXiv Detail & Related papers (2022-03-03T11:47:14Z)
Connection Sensitivity Matters for Training-free DARTS: From Architecture-Level Scoring to Operation-Level Sensitivity Analysis [32.94768616851585]
Recently proposed training-free NAS methods abandon the training phase and design various zero-cost proxies as scores to identify excellent architectures. In this paper, we raise an interesting problem: can we properly measure the operation importance in DARTS through a training-free way, with avoiding the parameter-intensive bias? By devising an iterative and data-agnostic manner in utilizing ZEROS for NAS, our novel trial leads to a framework called training free differentiable architecture search (FreeDARTS)
arXiv Detail & Related papers (2021-06-22T04:40:34Z)
DrNAS: Dirichlet Neural Architecture Search [88.56953713817545]
We treat the continuously relaxed architecture mixing weight as random variables, modeled by Dirichlet distribution. With recently developed pathwise derivatives, the Dirichlet parameters can be easily optimized with gradient-based generalization. To alleviate the large memory consumption of differentiable NAS, we propose a simple yet effective progressive learning scheme.
arXiv Detail & Related papers (2020-06-18T08:23:02Z)
Stabilizing Differentiable Architecture Search via Perturbation-based Regularization [99.81980366552408]
We find that the precipitous validation loss landscape, which leads to a dramatic performance drop when distilling the final architecture, is an essential factor that causes instability. We propose a perturbation-based regularization - SmoothDARTS (SDARTS) - to smooth the loss landscape and improve the generalizability of DARTS-based methods.
arXiv Detail & Related papers (2020-02-12T23:46:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.