Searching for Fast Model Families on Datacenter Accelerators
- URL: http://arxiv.org/abs/2102.05610v1
- Date: Wed, 10 Feb 2021 18:15:40 GMT
- Title: Searching for Fast Model Families on Datacenter Accelerators
- Authors: Sheng Li, Mingxing Tan, Ruoming Pang, Andrew Li, Liqun Cheng, Quoc Le,
Norman P. Jouppi
- Abstract summary: We search for fast and accurate CNN model families for efficient inference on DC accelerators.
We propose a latency-aware compound scaling (LACS) method optimizing both accuracy and latency.
Our LACS discovers that network depth should grow much faster than image size and network width.
- Score: 33.28421782921072
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural Architecture Search (NAS), together with model scaling, has shown
remarkable progress in designing high accuracy and fast convolutional
architecture families. However, as neither NAS nor model scaling considers
sufficient hardware architecture details, they do not take full advantage of
the emerging datacenter (DC) accelerators. In this paper, we search for fast
and accurate CNN model families for efficient inference on DC accelerators. We
first analyze DC accelerators and find that existing CNNs suffer from
insufficient operational intensity, parallelism, and execution efficiency.
These insights let us create a DC-accelerator-optimized search space, with
space-to-depth, space-to-batch, hybrid fused convolution structures with
vanilla and depthwise convolutions, and block-wise activation functions. On top
of our DC accelerator optimized neural architecture search space, we further
propose a latency-aware compound scaling (LACS), the first multi-objective
compound scaling method optimizing both accuracy and latency. Our LACS
discovers that network depth should grow much faster than image size and
network width, which is quite different from previous compound scaling results.
With the new search space and LACS, our search and scaling on datacenter
accelerators results in a new model series named EfficientNet-X. EfficientNet-X
is up to more than 2X faster than EfficientNet (a model series with
state-of-the-art trade-off on FLOPs and accuracy) on TPUv3 and GPUv100, with
comparable accuracy. EfficientNet-X is also up to 7X faster than recent RegNet
and ResNeSt on TPUv3 and GPUv100.
Related papers
- HASS: Hardware-Aware Sparsity Search for Dataflow DNN Accelerator [47.66463010685586]
We propose a novel approach to exploit unstructured weights and activations sparsity for dataflow accelerators, using software and hardware co-optimization.
We achieve an efficiency improvement ranging from 1.3$times$ to 4.2$times$ compared to existing sparse designs.
arXiv Detail & Related papers (2024-06-05T09:25:18Z) - DCP-NAS: Discrepant Child-Parent Neural Architecture Search for 1-bit
CNNs [53.82853297675979]
1-bit convolutional neural networks (CNNs) with binary weights and activations show their potential for resource-limited embedded devices.
One natural approach is to use 1-bit CNNs to reduce the computation and memory cost of NAS.
We introduce Discrepant Child-Parent Neural Architecture Search (DCP-NAS) to efficiently search 1-bit CNNs.
arXiv Detail & Related papers (2023-06-27T11:28:29Z) - KAPLA: Pragmatic Representation and Fast Solving of Scalable NN
Accelerator Dataflow [0.0]
We build a generic, optimized, and fast dataflow solver, KAPLA, to explore the design space with effective validity check and efficiency estimation.
KAPLA achieves within only 2.2% and 7.7% energy overheads on the result dataflow for training and inference.
It also outperforms random and machine-learning-based approaches, with more optimized results and orders of magnitude faster search speedup.
arXiv Detail & Related papers (2023-06-09T03:12:42Z) - FLASH: Fast Neural Architecture Search with Hardware Optimization [7.263481020106725]
Neural architecture search (NAS) is a promising technique to design efficient and high-performance deep neural networks (DNNs)
This paper proposes FLASH, a very fast NAS methodology that co-optimizes the DNN accuracy and performance on a real hardware platform.
arXiv Detail & Related papers (2021-08-01T23:46:48Z) - HANT: Hardware-Aware Network Transformation [82.54824188745887]
We propose hardware-aware network transformation (HANT)
HANT replaces inefficient operations with more efficient alternatives using a neural architecture search like approach.
Our results on accelerating the EfficientNet family show that HANT can accelerate them by up to 3.6x with 0.4% drop in the top-1 accuracy on the ImageNet dataset.
arXiv Detail & Related papers (2021-07-12T18:46:34Z) - FuSeConv: Fully Separable Convolutions for Fast Inference on Systolic
Arrays [2.8583189395674653]
We propose FuSeConv as a drop-in replacement for depth-wise separable convolution.
FuSeConv generalizes the decomposition of convolutions fully to separable 1D convolutions along spatial and depth dimensions.
We achieve a significant speed-up of 3x-7x with the MobileNet family of networks on a systolic array of size 64x64, with comparable accuracy on the ImageNet dataset.
arXiv Detail & Related papers (2021-05-27T20:19:39Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - Trilevel Neural Architecture Search for Efficient Single Image
Super-Resolution [127.92235484598811]
This paper proposes a trilevel neural architecture search (NAS) method for efficient single image super-resolution (SR)
For modeling the discrete search space, we apply a new continuous relaxation on the discrete search spaces to build a hierarchical mixture of network-path, cell-operations, and kernel-width.
An efficient search algorithm is proposed to perform optimization in a hierarchical supernet manner.
arXiv Detail & Related papers (2021-01-17T12:19:49Z) - FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining [65.39532971991778]
We present an accuracy predictor that scores architecture and training recipes jointly, guiding both sample selection and ranking.
We run fast evolutionary searches in just CPU minutes to generate architecture-recipe pairs for a variety of resource constraints.
FBNetV3 makes up a family of state-of-the-art compact neural networks that outperform both automatically and manually-designed competitors.
arXiv Detail & Related papers (2020-06-03T05:20:21Z) - DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution
Pruning [135.27931587381596]
We propose an efficient and unified NAS framework termed DDPNAS via dynamic distribution pruning.
In particular, we first sample architectures from a joint categorical distribution. Then the search space is dynamically pruned and its distribution is updated every few epochs.
With the proposed efficient network generation method, we directly obtain the optimal neural architectures on given constraints.
arXiv Detail & Related papers (2019-05-28T06:35:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.