Related papers: Improving Differentiable Architecture Search via Self-Distillation

Improving Differentiable Architecture Search via Self-Distillation

URL: http://arxiv.org/abs/2302.05629v2
Date: Fri, 1 Sep 2023 07:09:55 GMT
Title: Improving Differentiable Architecture Search via Self-Distillation
Authors: Xunyu Zhu, Jian Li, Yong Liu, Weiping Wang
Abstract summary: Differentiable Architecture Search (DARTS) is a simple yet efficient Neural Architecture Search (NAS) method. We propose Self-Distillation Differentiable Neural Architecture Search (SD-DARTS) to alleviate the discretization gap.
Score: 20.596850268316565
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Differentiable Architecture Search (DARTS) is a simple yet efficient Neural Architecture Search (NAS) method. During the search stage, DARTS trains a supernet by jointly optimizing architecture parameters and network parameters. During the evaluation stage, DARTS discretizes the supernet to derive the optimal architecture based on architecture parameters. However, recent research has shown that during the training process, the supernet tends to converge towards sharp minima rather than flat minima. This is evidenced by the higher sharpness of the loss landscape of the supernet, which ultimately leads to a performance gap between the supernet and the optimal architecture. In this paper, we propose Self-Distillation Differentiable Neural Architecture Search (SD-DARTS) to alleviate the discretization gap. We utilize self-distillation to distill knowledge from previous steps of the supernet to guide its training in the current step, effectively reducing the sharpness of the supernet's loss and bridging the performance gap between the supernet and the optimal architecture. Furthermore, we introduce the concept of voting teachers, where multiple previous supernets are selected as teachers, and their output probabilities are aggregated through voting to obtain the final teacher prediction. Experimental results on real datasets demonstrate the advantages of our novel self-distillation-based NAS method compared to state-of-the-art alternatives.

Related papers

OStr-DARTS: Differentiable Neural Architecture Search based on Operation Strength [70.76342136866413]
Differentiable architecture search (DARTS) has emerged as a promising technique for effective neural architecture search. DARTS suffers from the well-known degeneration issue which can lead to deteriorating architectures. We propose a novel criterion based on operation strength that estimates the importance of an operation by its effect on the final loss.
arXiv Detail & Related papers (2024-09-22T13:16:07Z)
The devil is in discretization discrepancy. Robustifying Differentiable NAS with Single-Stage Searching Protocol [2.4300749758571905]
gradient-based methods suffer from the discretization error, which can severely damage the process of obtaining the final architecture. We introduce a novel single-stage searching protocol, which is not reliant on decoding a continuous architecture. Our results demonstrate that this approach outperforms other DNAS methods by achieving 75.3% in the searching stage on the Cityscapes validation dataset.
arXiv Detail & Related papers (2024-05-26T15:44:53Z)
Boosting Order-Preserving and Transferability for Neural Architecture Search: a Joint Architecture Refined Search and Fine-tuning Approach [57.175488207316654]
We propose a novel concept of Supernet Shifting, a refined search strategy combining architecture searching with supernet fine-tuning. We show that Supernet Shifting can fulfill transferring supernet to a new dataset. Comprehensive experiments show that our method has better order-preserving ability and can find a dominating architecture.
arXiv Detail & Related papers (2024-03-18T00:13:41Z)
CLOSE: Curriculum Learning On the Sharing Extent Towards Better One-shot NAS [19.485514022334844]
One-shot Neural Architecture Search (NAS) has been widely used to discover architectures due to its efficiency. Previous studies reveal that one-shot performance estimations of architectures might not be well correlated with their performances in stand-alone training. We propose Curriculum Learning On Sharing Extent (CLOSE) to train the supernet both efficiently and effectively.
arXiv Detail & Related papers (2022-07-16T07:45:17Z)
Generalizing Few-Shot NAS with Gradient Matching [165.5690495295074]
One-Shot methods train one supernet to approximate the performance of every architecture in the search space via weight-sharing. Few-Shot NAS reduces the level of weight-sharing by splitting the One-Shot supernet into multiple separated sub-supernets. It significantly outperforms its Few-Shot counterparts while surpassing previous comparable methods in terms of the accuracy of derived architectures.
arXiv Detail & Related papers (2022-03-29T03:06:16Z)
D-DARTS: Distributed Differentiable Architecture Search [75.12821786565318]
Differentiable ARchiTecture Search (DARTS) is one of the most trending Neural Architecture Search (NAS) methods. We propose D-DARTS, a novel solution that addresses this problem by nesting several neural networks at cell-level.
arXiv Detail & Related papers (2021-08-20T09:07:01Z)
Rethinking Architecture Selection in Differentiable NAS [74.61723678821049]
Differentiable Neural Architecture Search is one of the most popular NAS methods for its search efficiency and simplicity. We propose an alternative perturbation-based architecture selection that directly measures each operation's influence on the supernet. We find that several failure modes of DARTS can be greatly alleviated with the proposed selection method.
arXiv Detail & Related papers (2021-08-10T00:53:39Z)
iDARTS: Differentiable Architecture Search with Stochastic Implicit Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS) We tackle the hypergradient computation in DARTS based on the implicit function theorem. We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.