PV-NAS: Practical Neural Architecture Search for Video Recognition
- URL: http://arxiv.org/abs/2011.00826v2
- Date: Tue, 3 Nov 2020 02:33:49 GMT
- Title: PV-NAS: Practical Neural Architecture Search for Video Recognition
- Authors: Zihao Wang, Chen Lin, Lu Sheng, Junjie Yan, Jing Shao
- Abstract summary: Deep neural networks for video tasks is highly customized and the design of such networks requires domain experts and costly trial and error tests.
Recent advance in network architecture search has boosted the image recognition performance in a large margin.
In this study, we propose a practical solution, namely Practical Video Neural Architecture Search (PV-NAS)
- Score: 83.77236063613579
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, deep learning has been utilized to solve video recognition problem
due to its prominent representation ability. Deep neural networks for video
tasks is highly customized and the design of such networks requires domain
experts and costly trial and error tests. Recent advance in network
architecture search has boosted the image recognition performance in a large
margin. However, automatic designing of video recognition network is less
explored. In this study, we propose a practical solution, namely Practical
Video Neural Architecture Search (PV-NAS).Our PV-NAS can efficiently search
across tremendous large scale of architectures in a novel spatial-temporal
network search space using the gradient based search methods. To avoid sticking
into sub-optimal solutions, we propose a novel learning rate scheduler to
encourage sufficient network diversity of the searched models. Extensive
empirical evaluations show that the proposed PV-NAS achieves state-of-the-art
performance with much fewer computational resources. 1) Within light-weight
models, our PV-NAS-L achieves 78.7% and 62.5% Top-1 accuracy on Kinetics-400
and Something-Something V2, which are better than previous state-of-the-art
methods (i.e., TSM) with a large margin (4.6% and 3.4% on each dataset,
respectively), and 2) among median-weight models, our PV-NAS-M achieves the
best performance (also a new record)in the Something-Something V2 dataset.
Related papers
- A Pairwise Comparison Relation-assisted Multi-objective Evolutionary Neural Architecture Search Method with Multi-population Mechanism [58.855741970337675]
Neural architecture search (NAS) enables re-searchers to automatically explore vast search spaces and find efficient neural networks.
NAS suffers from a key bottleneck, i.e., numerous architectures need to be evaluated during the search process.
We propose the SMEM-NAS, a pairwise com-parison relation-assisted multi-objective evolutionary algorithm based on a multi-population mechanism.
arXiv Detail & Related papers (2024-07-22T12:46:22Z) - DONNAv2 -- Lightweight Neural Architecture Search for Vision tasks [6.628409795264665]
We present the next-generation neural architecture design for computationally efficient neural architecture distillation - DONNAv2.
DONNAv2 reduces the computational cost of DONNA by 10x for the larger datasets.
To improve the quality of NAS search space, DONNAv2 leverages a block knowledge distillation filter to remove blocks with high inference costs.
arXiv Detail & Related papers (2023-09-26T04:48:50Z) - Lightweight Monocular Depth with a Novel Neural Architecture Search
Method [46.97673710849343]
This paper presents a novel neural architecture search method, called LiDNAS, for generating lightweight monocular depth estimation models.
We construct the search space on a pre-defined backbone network to balance layer diversity and search space size.
The LiDNAS optimized models achieve results superior to compact depth estimation state-of-the-art on NYU-Depth-v2, KITTI, and ScanNet.
arXiv Detail & Related papers (2021-08-25T08:06:28Z) - ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search [94.90294600817215]
We propose a novel neural architecture search (NAS) method, termed ViPNAS, to search networks in both spatial and temporal levels for fast online video pose estimation.
In the spatial level, we carefully design the search space with five different dimensions including network depth, width, kernel size, group number, and attentions.
In the temporal level, we search from a series of temporal feature fusions to optimize the total accuracy and speed across multiple video frames.
arXiv Detail & Related papers (2021-05-21T06:36:40Z) - Searching Efficient Model-guided Deep Network for Image Denoising [61.65776576769698]
We present a novel approach by connecting model-guided design with NAS (MoD-NAS)
MoD-NAS employs a highly reusable width search strategy and a densely connected search block to automatically select the operations of each layer.
Experimental results on several popular datasets show that our MoD-NAS has achieved even better PSNR performance than current state-of-the-art methods.
arXiv Detail & Related papers (2021-04-06T14:03:01Z) - NAS-TC: Neural Architecture Search on Temporal Convolutions for Complex
Action Recognition [45.168746142597946]
We propose a new processing framework called Neural Architecture Search- Temporal Convolutional (NAS-TC)
In the first phase, the classical CNN network is used as the backbone network to complete the computationally intensive feature extraction task.
In the second stage, a simple stitching search to the cell is used to complete the relatively lightweight long-range temporal-dependent information extraction.
arXiv Detail & Related papers (2021-03-17T02:02:11Z) - OPANAS: One-Shot Path Aggregation Network Architecture Search for Object
Detection [82.04372532783931]
Recently, neural architecture search (NAS) has been exploited to design feature pyramid networks (FPNs)
We propose a novel One-Shot Path Aggregation Network Architecture Search (OPANAS) algorithm, which significantly improves both searching efficiency and detection accuracy.
arXiv Detail & Related papers (2021-03-08T01:48:53Z) - AttentiveNAS: Improving Neural Architecture Search via Attentive
Sampling [39.58754758581108]
Two-stage Neural Architecture Search (NAS) achieves remarkable accuracy and efficiency.
Two-stage NAS requires sampling from the search space during training, which directly impacts the accuracy of the final searched models.
We propose AttentiveNAS that focuses on improving the sampling strategy to achieve better performance Pareto.
Our discovered model family, AttentiveNAS models, achieves top-1 accuracy from 77.3% to 80.7% on ImageNet, and outperforms SOTA models, including BigNAS and Once-for-All networks.
arXiv Detail & Related papers (2020-11-18T00:15:23Z) - DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution
Pruning [135.27931587381596]
We propose an efficient and unified NAS framework termed DDPNAS via dynamic distribution pruning.
In particular, we first sample architectures from a joint categorical distribution. Then the search space is dynamically pruned and its distribution is updated every few epochs.
With the proposed efficient network generation method, we directly obtain the optimal neural architectures on given constraints.
arXiv Detail & Related papers (2019-05-28T06:35:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.