BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage
Models
- URL: http://arxiv.org/abs/2003.11142v3
- Date: Fri, 17 Jul 2020 02:00:22 GMT
- Title: BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage
Models
- Authors: Jiahui Yu, Pengchong Jin, Hanxiao Liu, Gabriel Bender, Pieter-Jan
Kindermans, Mingxing Tan, Thomas Huang, Xiaodan Song, Ruoming Pang, Quoc Le
- Abstract summary: We propose BigNAS, an approach that challenges the conventional wisdom that post-processing of the weights is necessary to get good prediction accuracies.
Our discovered model family, BigNASModels, achieve top-1 accuracies ranging from 76.5% to 80.9%.
- Score: 59.95091850331499
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural architecture search (NAS) has shown promising results discovering
models that are both accurate and fast. For NAS, training a one-shot model has
become a popular strategy to rank the relative quality of different
architectures (child models) using a single set of shared weights. However,
while one-shot model weights can effectively rank different network
architectures, the absolute accuracies from these shared weights are typically
far below those obtained from stand-alone training. To compensate, existing
methods assume that the weights must be retrained, finetuned, or otherwise
post-processed after the search is completed. These steps significantly
increase the compute requirements and complexity of the architecture search and
model deployment. In this work, we propose BigNAS, an approach that challenges
the conventional wisdom that post-processing of the weights is necessary to get
good prediction accuracies. Without extra retraining or post-processing steps,
we are able to train a single set of shared weights on ImageNet and use these
weights to obtain child models whose sizes range from 200 to 1000 MFLOPs. Our
discovered model family, BigNASModels, achieve top-1 accuracies ranging from
76.5% to 80.9%, surpassing state-of-the-art models in this range including
EfficientNets and Once-for-All networks without extra retraining or
post-processing. We present ablative study and analysis to further understand
the proposed BigNASModels.
Related papers
- Representing Model Weights with Language using Tree Experts [39.90685550999956]
This paper learns to represent models within a joint space that embeds both model weights and language.
We introduce Probing Experts (ProbeX), a theoretically motivated, lightweight probing method.
Our results show that ProbeX can effectively map the weights of large models into a shared weight-language embedding space.
arXiv Detail & Related papers (2024-10-17T17:17:09Z) - DNA Family: Boosting Weight-Sharing NAS with Block-Wise Supervisions [121.05720140641189]
We develop a family of models with the distilling neural architecture (DNA) techniques.
Our proposed DNA models can rate all architecture candidates, as opposed to previous works that can only access a sub- search space using algorithms.
Our models achieve state-of-the-art top-1 accuracy of 78.9% and 83.6% on ImageNet for a mobile convolutional network and a small vision transformer, respectively.
arXiv Detail & Related papers (2024-03-02T22:16:47Z) - Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts [55.470959564665705]
Weight-sharing supernets are crucial for performance estimation in cutting-edge neural search frameworks.
The proposed method attains state-of-the-art (SoTA) performance in NAS for fast machine translation models.
It excels in NAS for building memory-efficient task-agnostic BERT models.
arXiv Detail & Related papers (2023-06-08T00:35:36Z) - AceNAS: Learning to Rank Ace Neural Architectures with Weak Supervision
of Weight Sharing [6.171090327531059]
We introduce Learning to Rank methods to select the best (ace) architectures from a space.
We also propose to leverage weak supervision from weight sharing by pretraining architecture representation on weak labels obtained from the super-net.
Experiments on NAS benchmarks and large-scale search spaces demonstrate that our approach outperforms SOTA with a significantly reduced search cost.
arXiv Detail & Related papers (2021-08-06T08:31:42Z) - Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models.
Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z) - BossNAS: Exploring Hybrid CNN-transformers with Block-wisely
Self-supervised Neural Architecture Search [100.28980854978768]
We present Block-wisely Self-supervised Neural Architecture Search (BossNAS)
We factorize the search space into blocks and utilize a novel self-supervised training scheme, named ensemble bootstrapping, to train each block separately.
We also present HyTra search space, a fabric-like hybrid CNN-transformer search space with searchable down-sampling positions.
arXiv Detail & Related papers (2021-03-23T10:05:58Z) - AttentiveNAS: Improving Neural Architecture Search via Attentive
Sampling [39.58754758581108]
Two-stage Neural Architecture Search (NAS) achieves remarkable accuracy and efficiency.
Two-stage NAS requires sampling from the search space during training, which directly impacts the accuracy of the final searched models.
We propose AttentiveNAS that focuses on improving the sampling strategy to achieve better performance Pareto.
Our discovered model family, AttentiveNAS models, achieves top-1 accuracy from 77.3% to 80.7% on ImageNet, and outperforms SOTA models, including BigNAS and Once-for-All networks.
arXiv Detail & Related papers (2020-11-18T00:15:23Z) - Powering One-shot Topological NAS with Stabilized Share-parameter Proxy [65.09967910722932]
One-shot NAS method has attracted much interest from the research community due to its remarkable training efficiency and capacity to discover high performance models.
In this work, we try to enhance the one-shot NAS by exploring high-performing network architectures in our large-scale Topology Augmented Search Space.
The proposed method achieves state-of-the-art performance under Multiply-Adds (MAdds) constraint on ImageNet.
arXiv Detail & Related papers (2020-05-21T08:18:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.