PatchNet -- Short-range Template Matching for Efficient Video Processing
- URL: http://arxiv.org/abs/2103.07371v1
- Date: Wed, 10 Mar 2021 20:56:07 GMT
- Title: PatchNet -- Short-range Template Matching for Efficient Video Processing
- Authors: Huizi Mao, Sibo Zhu, Song Han, William J. Dally
- Abstract summary: We propose PatchNet, an efficient convolutional neural network to match objects in adjacent video frames.
PatchNet is very compact, running at just 58MFLOPs, $5times$ simpler than MobileNetV2.
We demonstrate its application on two tasks, video object detection and visual object tracking.
- Score: 16.33718159978111
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Object recognition is a fundamental problem in many video processing tasks,
accurately locating seen objects at low computation cost paves the way for
on-device video recognition. We propose PatchNet, an efficient convolutional
neural network to match objects in adjacent video frames. It learns the
patchwise correlation features instead of pixel features. PatchNet is very
compact, running at just 58MFLOPs, $5\times$ simpler than MobileNetV2. We
demonstrate its application on two tasks, video object detection and visual
object tracking. On ImageNet VID, PatchNet reduces the flops of R-FCN
ResNet-101 by 5x and EfficientDet-D0 by 3.4x with less than 1% mAP loss. On
OTB2015, PatchNet reduces SiamFC and SiamRPN by 2.5x with no accuracy loss.
Experiments on Jetson Nano further demonstrate 2.8x to 4.3x speed-ups
associated with flops reduction. Code is open sourced at
https://github.com/RalphMao/PatchNet.
Related papers
- PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference [11.112356346406365]
PaPr is a method for substantially pruning redundant patches with minimal accuracy loss using lightweight ConvNets.
It achieves significantly higher accuracy over state-of-the-art patch reduction methods with similar FLOP count reduction.
arXiv Detail & Related papers (2024-03-24T05:50:00Z) - Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks [15.519170283930276]
We propose a novel partial convolution (PConv) that extracts spatial features more efficiently, by cutting down redundant computation and memory access simultaneously.
Building upon our PConv, we further propose FasterNet, a new family of neural networks, which attains substantially higher running speed than others on a wide range of devices.
Our large FasterNet-L achieves impressive $83.5%$ top-1 accuracy, on par with the emerging Swin-B, while having $36%$ higher inference throughput on GPU.
arXiv Detail & Related papers (2023-03-07T06:05:30Z) - GhostNetV2: Enhance Cheap Operation with Long-Range Attention [59.65543143580889]
We propose a hardware-friendly attention mechanism (dubbed DFC attention) and then present a new GhostNetV2 architecture for mobile applications.
The proposed DFC attention is constructed based on fully-connected layers, which can not only execute fast on common hardware but also capture the dependence between long-range pixels.
We further revisit the bottleneck in previous GhostNet and propose to enhance expanded features produced by cheap operations with DFC attention.
arXiv Detail & Related papers (2022-11-23T12:16:59Z) - RepGhost: A Hardware-Efficient Ghost Module via Re-parameterization [13.605461609002539]
Feature reuse has been a key technique in light-weight convolutional neural networks (CNNs) architecture design.
Current methods usually utilize a concatenation operator to keep large channel numbers cheaply (thus large network capacity) by reusing feature maps from other layers.
This paper provides a new perspective to realize feature reuse implicitly and more efficiently instead of concatenation.
arXiv Detail & Related papers (2022-11-11T09:44:23Z) - MogaNet: Multi-order Gated Aggregation Network [64.16774341908365]
We propose a new family of modern ConvNets, dubbed MogaNet, for discriminative visual representation learning.
MogaNet encapsulates conceptually simple yet effective convolutions and gated aggregation into a compact module.
MogaNet exhibits great scalability, impressive efficiency of parameters, and competitive performance compared to state-of-the-art ViTs and ConvNets on ImageNet.
arXiv Detail & Related papers (2022-11-07T04:31:17Z) - MicroNet: Improving Image Recognition with Extremely Low FLOPs [82.54764264255505]
We find two factors, sparse connectivity and dynamic activation function, are effective to improve the accuracy.
We present a new dynamic activation function, named Dynamic Shift Max, to improve the non-linearity.
We arrive at a family of networks, named MicroNet, that achieves significant performance gains over the state of the art in the low FLOP regime.
arXiv Detail & Related papers (2021-08-12T17:59:41Z) - MoViNets: Mobile Video Networks for Efficient Video Recognition [52.49314494202433]
3D convolutional neural networks (CNNs) are accurate at video recognition but require large computation and memory budgets.
We propose a three-step approach to improve computational efficiency while substantially reducing the peak memory usage of 3D CNNs.
arXiv Detail & Related papers (2021-03-21T23:06:38Z) - Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets [65.28292822614418]
Giant formula for simultaneously enlarging the resolution, depth and width provides us a Rubik's cube for neural networks.
This paper aims to explore the twisting rules for obtaining deep neural networks with minimum model sizes and computational costs.
arXiv Detail & Related papers (2020-10-28T08:49:45Z) - Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization.
Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z) - DyNet: Dynamic Convolution for Accelerating Convolutional Neural
Networks [16.169176006544436]
We propose a novel dynamic convolution method to adaptively generate convolution kernels based on image contents.
Based on the architecture MobileNetV3-Small/Large, DyNet achieves 70.3/77.1% Top-1 accuracy on ImageNet with an improvement of 2.9/1.9%.
arXiv Detail & Related papers (2020-04-22T16:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.