Dynamic Network Quantization for Efficient Video Inference
- URL: http://arxiv.org/abs/2108.10394v1
- Date: Mon, 23 Aug 2021 20:23:57 GMT
- Title: Dynamic Network Quantization for Efficient Video Inference
- Authors: Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Aude Oliva, Rogerio Feris,
Kate Saenko
- Abstract summary: We propose a dynamic network quantization framework, that selects optimal precision for each frame conditioned on the input for efficient video recognition.
We train both networks effectively using standard backpropagation with a loss to achieve both competitive performance and resource efficiency.
- Score: 60.109250720206425
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep convolutional networks have recently achieved great success in video
recognition, yet their practical realization remains a challenge due to the
large amount of computational resources required to achieve robust recognition.
Motivated by the effectiveness of quantization for boosting efficiency, in this
paper, we propose a dynamic network quantization framework, that selects
optimal precision for each frame conditioned on the input for efficient video
recognition. Specifically, given a video clip, we train a very lightweight
network in parallel with the recognition network, to produce a dynamic policy
indicating which numerical precision to be used per frame in recognizing
videos. We train both networks effectively using standard backpropagation with
a loss to achieve both competitive performance and resource efficiency required
for video recognition. Extensive experiments on four challenging diverse
benchmark datasets demonstrate that our proposed approach provides significant
savings in computation and memory usage while outperforming the existing
state-of-the-art methods.
Related papers
- NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition [89.84188594758588]
A novel Non-saliency Suppression Network (NSNet) is proposed to suppress the responses of non-salient frames.
NSNet achieves the state-of-the-art accuracy-efficiency trade-off and presents a significantly faster (2.44.3x) practical inference speed than state-of-the-art methods.
arXiv Detail & Related papers (2022-07-21T09:41:22Z) - AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition [61.51188561808917]
We propose an adaptive multi-modal learning framework, called AdaMML, that selects on-the-fly the optimal modalities for each segment conditioned on the input for efficient video recognition.
We show that our proposed approach yields 35%-55% reduction in computation when compared to the traditional baseline.
arXiv Detail & Related papers (2021-05-11T16:19:07Z) - Adaptive Focus for Efficient Video Recognition [29.615394426035074]
We propose a reinforcement learning based approach for efficient spatially adaptive video recognition (AdaFocus)
A light-weighted ConvNet is first adopted to quickly process the full video sequence, whose features are used by a recurrent policy network to localize the most task-relevant regions.
During offline inference, once the informative patch sequence has been generated, the bulk of computation can be done in parallel, and is efficient on modern GPU devices.
arXiv Detail & Related papers (2021-05-07T13:24:47Z) - A Reinforcement-Learning-Based Energy-Efficient Framework for Multi-Task
Video Analytics Pipeline [16.72264118199915]
Video analytics pipelines are energy-intensive due to high data rates and reliance on complex inference algorithms.
We propose an adaptive-resolution optimization framework to minimize the energy use of multi-task video analytics pipelines.
Our framework has significantly surpassed all baseline methods of similar accuracy on the YouTube-VIS dataset.
arXiv Detail & Related papers (2021-04-09T15:44:06Z) - Collaborative Distillation in the Parameter and Spectrum Domains for
Video Action Recognition [79.60708268515293]
This paper explores how to train small and efficient networks for action recognition.
We propose two distillation strategies in the frequency domain, namely the feature spectrum and parameter distribution distillations respectively.
Our method can achieve higher performance than state-of-the-art methods with the same backbone.
arXiv Detail & Related papers (2020-09-15T07:29:57Z) - AR-Net: Adaptive Frame Resolution for Efficient Action Recognition [70.62587948892633]
Action recognition is an open and challenging problem in computer vision.
We propose a novel approach, called AR-Net, that selects on-the-fly the optimal resolution for each frame conditioned on the input for efficient action recognition.
arXiv Detail & Related papers (2020-07-31T01:36:04Z) - Dynamic Inference: A New Approach Toward Efficient Video Action
Recognition [69.9658249941149]
Action recognition in videos has achieved great success recently, but it remains a challenging task due to the massive computational cost.
We propose a general dynamic inference idea to improve inference efficiency by leveraging the variation in the distinguishability of different videos.
arXiv Detail & Related papers (2020-02-09T11:09:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.