Mutually-aware Sub-Graphs Differentiable Architecture Search
- URL: http://arxiv.org/abs/2107.04324v2
- Date: Mon, 12 Jul 2021 09:46:24 GMT
- Title: Mutually-aware Sub-Graphs Differentiable Architecture Search
- Authors: Haoxian Tan, Sheng Guo, Yujie Zhong, Weilin Huang
- Abstract summary: Mutually-aware Sub-Graphs Differentiable Architecture Search (MSG-DAS)
MSG-DAS is a differentiable Gumbel-TopK sampler that produces multiple mutually exclusive single-path sub-graphs.
We demonstrate the effectiveness of our methods on ImageNet and CIFAR10, where the searched models show a comparable performance as the most recent approaches.
- Score: 29.217547815683748
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Differentiable architecture search is prevalent in the field of NAS because
of its simplicity and efficiency, where two paradigms, multi-path algorithms
and single-path methods, are dominated. Multi-path framework (e.g. DARTS) is
intuitive but suffers from memory usage and training collapse. Single-path
methods (e.g.GDAS and ProxylessNAS) mitigate the memory issue and shrink the
gap between searching and evaluation but sacrifice the performance. In this
paper, we propose a conceptually simple yet efficient method to bridge these
two paradigms, referred as Mutually-aware Sub-Graphs Differentiable
Architecture Search (MSG-DAS). The core of our framework is a differentiable
Gumbel-TopK sampler that produces multiple mutually exclusive single-path
sub-graphs. To alleviate the severer skip-connect issue brought by multiple
sub-graphs setting, we propose a Dropblock-Identity module to stabilize the
optimization. To make best use of the available models (super-net and
sub-graphs), we introduce a memory-efficient super-net guidance distillation to
improve training. The proposed framework strikes a balance between flexible
memory usage and searching quality. We demonstrate the effectiveness of our
methods on ImageNet and CIFAR10, where the searched models show a comparable
performance as the most recent approaches.
Related papers
- Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search [49.81353382211113]
We address the challenge of integrating multi-head self-attention into high resolution representation CNNs efficiently.
We develop a multi-target multi-branch supernet method, which fully utilizes the advantages of high-resolution features.
We present a series of model via Hybrid Convolutional-Transformer Architecture Search (HyCTAS) method that searched for the best hybrid combination of light-weight convolution layers and memory-efficient self-attention layers.
arXiv Detail & Related papers (2024-03-15T15:47:54Z) - MGAS: Multi-Granularity Architecture Search for Trade-Off Between Model
Effectiveness and Efficiency [10.641875933652647]
We introduce multi-granularity architecture search (MGAS) to discover both effective and efficient neural networks.
We learn discretization functions specific to each granularity level to adaptively determine the unit remaining ratio according to the evolving architecture.
Extensive experiments on CIFAR-10, CIFAR-100 and ImageNet demonstrate that MGAS outperforms other state-of-the-art methods in achieving a better trade-off between model performance and model size.
arXiv Detail & Related papers (2023-10-23T16:32:18Z) - Pruning-as-Search: Efficient Neural Architecture Search via Channel
Pruning and Structural Reparameterization [50.50023451369742]
Pruning-as-Search (PaS) is an end-to-end channel pruning method to search out desired sub-network automatically and efficiently.
Our proposed architecture outperforms prior arts by around $1.0%$ top-1 accuracy on ImageNet-1000 classification task.
arXiv Detail & Related papers (2022-06-02T17:58:54Z) - Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks.
The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z) - Decoupled and Memory-Reinforced Networks: Towards Effective Feature
Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network.
There are two major challenges in the current one-step approaches.
We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z) - ROME: Robustifying Memory-Efficient NAS via Topology Disentanglement and
Gradient Accumulation [106.04777600352743]
Differentiable architecture search (DARTS) is largely hindered by its substantial memory cost since the entire supernet resides in the memory.
The single-path DARTS comes in, which only chooses a single-path submodel at each step.
While being memory-friendly, it also comes with low computational costs.
We propose a new algorithm called RObustifying Memory-Efficient NAS (ROME) to give a cure.
arXiv Detail & Related papers (2020-11-23T06:34:07Z) - Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural
Architecture Search [60.965024145243596]
One-shot weight sharing methods have recently drawn great attention in neural architecture search due to high efficiency and competitive performance.
To alleviate this problem, we present a simple yet effective architecture distillation method.
We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training.
Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop.
arXiv Detail & Related papers (2020-10-29T17:55:05Z) - MixPath: A Unified Approach for One-shot Neural Architecture Search [13.223963114415552]
We propose a novel mechanism called Shadow Batch Normalization (SBN) to regularize the disparate feature statistics.
We call our unified multi-path one-shot approach as MixPath, which generates a series of models that achieve state-of-the-art results on ImageNet.
arXiv Detail & Related papers (2020-01-16T15:24:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.