Related papers: BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models

BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models

URL: http://arxiv.org/abs/2003.11142v3
Date: Fri, 17 Jul 2020 02:00:22 GMT
Title: BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models
Authors: Jiahui Yu, Pengchong Jin, Hanxiao Liu, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Thomas Huang, Xiaodan Song, Ruoming Pang, Quoc Le
Abstract summary: We propose BigNAS, an approach that challenges the conventional wisdom that post-processing of the weights is necessary to get good prediction accuracies. Our discovered model family, BigNASModels, achieve top-1 accuracies ranging from 76.5% to 80.9%.
Score: 59.95091850331499
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural architecture search (NAS) has shown promising results discovering models that are both accurate and fast. For NAS, training a one-shot model has become a popular strategy to rank the relative quality of different architectures (child models) using a single set of shared weights. However, while one-shot model weights can effectively rank different network architectures, the absolute accuracies from these shared weights are typically far below those obtained from stand-alone training. To compensate, existing methods assume that the weights must be retrained, finetuned, or otherwise post-processed after the search is completed. These steps significantly increase the compute requirements and complexity of the architecture search and model deployment. In this work, we propose BigNAS, an approach that challenges the conventional wisdom that post-processing of the weights is necessary to get good prediction accuracies. Without extra retraining or post-processing steps, we are able to train a single set of shared weights on ImageNet and use these weights to obtain child models whose sizes range from 200 to 1000 MFLOPs. Our discovered model family, BigNASModels, achieve top-1 accuracies ranging from 76.5% to 80.9%, surpassing state-of-the-art models in this range including EfficientNets and Once-for-All networks without extra retraining or post-processing. We present ablative study and analysis to further understand the proposed BigNASModels.

Related papers

The Impact of Model Zoo Size and Composition on Weight Space Learning [8.11780615053558]
Re-using trained neural network models is a common strategy to reduce training cost and transfer knowledge. Weight space learning is a promising new field to re-use populations of pre-trained models for future tasks. We propose a modification to a common weight space learning method to accommodate training on heterogeneous populations of models.
arXiv Detail & Related papers (2025-04-14T11:54:06Z)
Representing Model Weights with Language using Tree Experts [39.90685550999956]
This paper learns to represent models within a joint space that embeds both model weights and language. We introduce Probing Experts (ProbeX), a theoretically motivated, lightweight probing method. Our results show that ProbeX can effectively map the weights of large models into a shared weight-language embedding space.
arXiv Detail & Related papers (2024-10-17T17:17:09Z)
A Pairwise Comparison Relation-assisted Multi-objective Evolutionary Neural Architecture Search Method with Multi-population Mechanism [56.09418231453024]
Neural architecture search (NAS) enables researchers to automatically explore vast search spaces and find efficient neural networks.<n>NAS suffers from a key bottleneck, i.e., numerous architectures need to be evaluated during the search process.<n>We propose the SMEM-NAS, a pairwise comparison relation-assisted multi-objective evolutionary algorithm based on a multi-population mechanism.
arXiv Detail & Related papers (2024-07-22T12:46:22Z)
DNA Family: Boosting Weight-Sharing NAS with Block-Wise Supervisions [121.05720140641189]
We develop a family of models with the distilling neural architecture (DNA) techniques. Our proposed DNA models can rate all architecture candidates, as opposed to previous works that can only access a sub- search space using algorithms. Our models achieve state-of-the-art top-1 accuracy of 78.9% and 83.6% on ImageNet for a mobile convolutional network and a small vision transformer, respectively.
arXiv Detail & Related papers (2024-03-02T22:16:47Z)
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts [55.470959564665705]
Weight-sharing supernets are crucial for performance estimation in cutting-edge neural search frameworks. The proposed method attains state-of-the-art (SoTA) performance in NAS for fast machine translation models. It excels in NAS for building memory-efficient task-agnostic BERT models.
arXiv Detail & Related papers (2023-06-08T00:35:36Z)
AceNAS: Learning to Rank Ace Neural Architectures with Weak Supervision of Weight Sharing [6.171090327531059]
We introduce Learning to Rank methods to select the best (ace) architectures from a space. We also propose to leverage weak supervision from weight sharing by pretraining architecture representation on weak labels obtained from the super-net. Experiments on NAS benchmarks and large-scale search spaces demonstrate that our approach outperforms SOTA with a significantly reduced search cost.
arXiv Detail & Related papers (2021-08-06T08:31:42Z)
Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models. Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z)
BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search [100.28980854978768]
We present Block-wisely Self-supervised Neural Architecture Search (BossNAS) We factorize the search space into blocks and utilize a novel self-supervised training scheme, named ensemble bootstrapping, to train each block separately. We also present HyTra search space, a fabric-like hybrid CNN-transformer search space with searchable down-sampling positions.
arXiv Detail & Related papers (2021-03-23T10:05:58Z)
AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling [39.58754758581108]
Two-stage Neural Architecture Search (NAS) achieves remarkable accuracy and efficiency. Two-stage NAS requires sampling from the search space during training, which directly impacts the accuracy of the final searched models. We propose AttentiveNAS that focuses on improving the sampling strategy to achieve better performance Pareto. Our discovered model family, AttentiveNAS models, achieves top-1 accuracy from 77.3% to 80.7% on ImageNet, and outperforms SOTA models, including BigNAS and Once-for-All networks.
arXiv Detail & Related papers (2020-11-18T00:15:23Z)
Powering One-shot Topological NAS with Stabilized Share-parameter Proxy [65.09967910722932]
One-shot NAS method has attracted much interest from the research community due to its remarkable training efficiency and capacity to discover high performance models. In this work, we try to enhance the one-shot NAS by exploring high-performing network architectures in our large-scale Topology Augmented Search Space. The proposed method achieves state-of-the-art performance under Multiply-Adds (MAdds) constraint on ImageNet.
arXiv Detail & Related papers (2020-05-21T08:18:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.