Adaptive Recursive Circle Framework for Fine-grained Action Recognition
- URL: http://arxiv.org/abs/2107.11813v1
- Date: Sun, 25 Jul 2021 14:24:29 GMT
- Title: Adaptive Recursive Circle Framework for Fine-grained Action Recognition
- Authors: Hanxi Lin, Xinxiao Wu, Jiebo Luo
- Abstract summary: How to model fine-grained spatial-temporal dynamics in videos has been a challenging problem for action recognition.
Most existing methods generate features of a layer in a pure feedforward manner.
We propose an Adaptive Recursive Circle framework, a fine-grained decorator for pure feedforward layers.
- Score: 95.51097674917851
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How to model fine-grained spatial-temporal dynamics in videos has been a
challenging problem for action recognition. It requires learning deep and rich
features with superior distinctiveness for the subtle and abstract motions.
Most existing methods generate features of a layer in a pure feedforward
manner, where the information moves in one direction from inputs to outputs.
And they rely on stacking more layers to obtain more powerful features,
bringing extra non-negligible overheads. In this paper, we propose an Adaptive
Recursive Circle (ARC) framework, a fine-grained decorator for pure feedforward
layers. It inherits the operators and parameters of the original layer but is
slightly different in the use of those operators and parameters. Specifically,
the input of the layer is treated as an evolving state, and its update is
alternated with the feature generation. At each recursive step, the input state
is enriched by the previously generated features and the feature generation is
made with the newly updated input state. We hope the ARC framework can
facilitate fine-grained action recognition by introducing deeply refined
features and multi-scale receptive fields at a low cost. Significant
improvements over feedforward baselines are observed on several benchmarks. For
example, an ARC-equipped TSM-ResNet18 outperforms TSM-ResNet50 with 48% fewer
FLOPs and 52% model parameters on Something-Something V1 and Diving48.
Related papers
- RecursiveDet: End-to-End Region-based Recursive Object Detection [19.799892459080485]
Region-based object detectors like Sparse R-CNN usually have multiple cascade bounding box decoding stages.
In this paper, we find the general setting of decoding stages is actually redundant.
The RecusiveDet is able to achieve obvious performance boosts with even fewer model parameters.
arXiv Detail & Related papers (2023-07-25T16:22:58Z) - Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task.
A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks.
Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z) - Semantics-Aware Dynamic Localization and Refinement for Referring Image
Segmentation [102.25240608024063]
Referring image segments an image from a language expression.
We develop an algorithm that shifts from being localization-centric to segmentation-language.
Compared to its counterparts, our method is more versatile yet effective.
arXiv Detail & Related papers (2023-03-11T08:42:40Z) - A Faster, Lighter and Stronger Deep Learning-Based Approach for Place
Recognition [7.9400442516053475]
We propose a faster, lighter and stronger approach that can generate models with fewer parameters and can spend less time in the inference stage.
We design RepVGG-lite as the backbone network in our architecture, it is more discriminative than other general networks in the Place Recognition task.
Our system has 14 times less params than Patch-NetVLAD, 6.8 times lower theoretical FLOPs, and run faster 21 and 33 times in feature extraction and feature matching.
arXiv Detail & Related papers (2022-11-27T15:46:53Z) - GhostSR: Learning Ghost Features for Efficient Image Super-Resolution [49.393251361038025]
Single image super-resolution (SISR) system based on convolutional neural networks (CNNs) achieves fancy performance while requires huge computational costs.
We propose to use shift operation to generate the redundant features (i.e., Ghost features) of SISR models.
We show that both the non-compact and lightweight SISR models embedded in our proposed module can achieve comparable performance to that of their baselines.
arXiv Detail & Related papers (2021-01-21T10:09:47Z) - Fine-Grained Dynamic Head for Object Detection [68.70628757217939]
We propose a fine-grained dynamic head to conditionally select a pixel-level combination of FPN features from different scales for each instance.
Experiments demonstrate the effectiveness and efficiency of the proposed method on several state-of-the-art detection benchmarks.
arXiv Detail & Related papers (2020-12-07T08:16:32Z) - Lightweight Single-Image Super-Resolution Network with Attentive
Auxiliary Feature Learning [73.75457731689858]
We develop a computation efficient yet accurate network based on the proposed attentive auxiliary features (A$2$F) for SISR.
Experimental results on large-scale dataset demonstrate the effectiveness of the proposed model against the state-of-the-art (SOTA) SR methods.
arXiv Detail & Related papers (2020-11-13T06:01:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.