Small Temperature is All You Need for Differentiable Architecture Search
- URL: http://arxiv.org/abs/2306.06855v1
- Date: Mon, 12 Jun 2023 04:01:57 GMT
- Title: Small Temperature is All You Need for Differentiable Architecture Search
- Authors: Jiuling Zhang, Zhiming Ding
- Abstract summary: Differentiable architecture search (DARTS) yields highly efficient gradient-based neural architecture search (NAS)
We propose to close the gap between the relaxed supernet in training and the pruned finalnet in evaluation through utilizing small temperature.
- Score: 8.93957397187611
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Differentiable architecture search (DARTS) yields highly efficient
gradient-based neural architecture search (NAS) by relaxing the discrete
operation selection to optimize continuous architecture parameters that maps
NAS from the discrete optimization to a continuous problem. DARTS then remaps
the relaxed supernet back to the discrete space by one-off post-search pruning
to obtain the final architecture (finalnet). Some emerging works argue that
this remap is inherently prone to mismatch the network between training and
evaluation which leads to performance discrepancy and even model collapse in
extreme cases. We propose to close the gap between the relaxed supernet in
training and the pruned finalnet in evaluation through utilizing small
temperature to sparsify the continuous distribution in the training phase. To
this end, we first formulate sparse-noisy softmax to get around gradient
saturation. We then propose an exponential temperature schedule to better
control the outbound distribution and elaborate an entropy-based adaptive
scheme to finally achieve the enhancement. We conduct extensive experiments to
verify the efficiency and efficacy of our method.
Related papers
- Sparsest Models Elude Pruning: An Exposé of Pruning's Current Capabilities [4.842973374883628]
Pruning has emerged as a promising approach for compressing large-scale models, yet its effectiveness in recovering the sparsest of models has not yet been explored.
We conducted an extensive series of 485,838 experiments, applying a range of state-of-the-art pruning algorithms to a synthetic dataset we created, named the Cubist Spiral.
Our findings reveal a significant gap in performance compared to ideal sparse networks, which we identified through a novel search algorithm.
arXiv Detail & Related papers (2024-07-04T17:33:15Z) - The devil is in discretization discrepancy. Robustifying Differentiable NAS with Single-Stage Searching Protocol [2.4300749758571905]
gradient-based methods suffer from the discretization error, which can severely damage the process of obtaining the final architecture.
We introduce a novel single-stage searching protocol, which is not reliant on decoding a continuous architecture.
Our results demonstrate that this approach outperforms other DNAS methods by achieving 75.3% in the searching stage on the Cityscapes validation dataset.
arXiv Detail & Related papers (2024-05-26T15:44:53Z) - Pruning Convolutional Filters via Reinforcement Learning with Entropy
Minimization [0.0]
We introduce a novel information-theoretic reward function which minimizes the spatial entropy of convolutional activations.
Our method shows that there is another possibility to preserve accuracy without the need to directly optimize it in the agent's reward function.
arXiv Detail & Related papers (2023-12-08T09:34:57Z) - Shapley-NAS: Discovering Operation Contribution for Neural Architecture
Search [96.20505710087392]
We propose a Shapley value based method to evaluate operation contribution (Shapley-NAS) for neural architecture search.
We show that our method outperforms the state-of-the-art methods by a considerable margin with light search cost.
arXiv Detail & Related papers (2022-06-20T14:41:49Z) - ZARTS: On Zero-order Optimization for Neural Architecture Search [94.41017048659664]
Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency.
This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation.
In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue.
arXiv Detail & Related papers (2021-10-10T09:35:15Z) - iDARTS: Differentiable Architecture Search with Stochastic Implicit
Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS)
We tackle the hypergradient computation in DARTS based on the implicit function theorem.
We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z) - Non-Gradient Manifold Neural Network [79.44066256794187]
Deep neural network (DNN) generally takes thousands of iterations to optimize via gradient descent.
We propose a novel manifold neural network based on non-gradient optimization.
arXiv Detail & Related papers (2021-06-15T06:39:13Z) - ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse
Coding [86.40042104698792]
We formulate neural architecture search as a sparse coding problem.
In experiments, our two-stage method on CIFAR-10 requires only 0.05 GPU-day for search.
Our one-stage method produces state-of-the-art performances on both CIFAR-10 and ImageNet at the cost of only evaluation time.
arXiv Detail & Related papers (2020-10-13T04:34:24Z) - DrNAS: Dirichlet Neural Architecture Search [88.56953713817545]
We treat the continuously relaxed architecture mixing weight as random variables, modeled by Dirichlet distribution.
With recently developed pathwise derivatives, the Dirichlet parameters can be easily optimized with gradient-based generalization.
To alleviate the large memory consumption of differentiable NAS, we propose a simple yet effective progressive learning scheme.
arXiv Detail & Related papers (2020-06-18T08:23:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.