Related papers: Adaptive Recursive Circle Framework for Fine-grained Action Recognition

Adaptive Recursive Circle Framework for Fine-grained Action Recognition

URL: http://arxiv.org/abs/2107.11813v1
Date: Sun, 25 Jul 2021 14:24:29 GMT
Title: Adaptive Recursive Circle Framework for Fine-grained Action Recognition
Authors: Hanxi Lin, Xinxiao Wu, Jiebo Luo
Abstract summary: How to model fine-grained spatial-temporal dynamics in videos has been a challenging problem for action recognition. Most existing methods generate features of a layer in a pure feedforward manner. We propose an Adaptive Recursive Circle framework, a fine-grained decorator for pure feedforward layers.
Score: 95.51097674917851
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: How to model fine-grained spatial-temporal dynamics in videos has been a challenging problem for action recognition. It requires learning deep and rich features with superior distinctiveness for the subtle and abstract motions. Most existing methods generate features of a layer in a pure feedforward manner, where the information moves in one direction from inputs to outputs. And they rely on stacking more layers to obtain more powerful features, bringing extra non-negligible overheads. In this paper, we propose an Adaptive Recursive Circle (ARC) framework, a fine-grained decorator for pure feedforward layers. It inherits the operators and parameters of the original layer but is slightly different in the use of those operators and parameters. Specifically, the input of the layer is treated as an evolving state, and its update is alternated with the feature generation. At each recursive step, the input state is enriched by the previously generated features and the feature generation is made with the newly updated input state. We hope the ARC framework can facilitate fine-grained action recognition by introducing deeply refined features and multi-scale receptive fields at a low cost. Significant improvements over feedforward baselines are observed on several benchmarks. For example, an ARC-equipped TSM-ResNet18 outperforms TSM-ResNet50 with 48% fewer FLOPs and 52% model parameters on Something-Something V1 and Diving48.

Related papers

ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts. Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
Features that Make a Difference: Leveraging Gradients for Improved Dictionary Learning [4.051777802443125]
Sparse Autoencoders (SAEs) are a promising approach for extracting neural network representations. We introduce Gradient SAEs, which modify the $k$-sparse autoencoder architecture by augmenting the TopK activation function. We find evidence that g-SAEs learn latents that are on average more effective at steering models in arbitrary contexts.
arXiv Detail & Related papers (2024-11-15T18:03:52Z)
SL$^{2}$A-INR: Single-Layer Learnable Activation for Implicit Neural Representation [6.572456394600755]
Implicit Neural Representation (INR) leveraging a neural network to transform coordinate input into corresponding attributes has driven significant advances in vision-related domains. We show that these challenges can be alleviated by introducing a novel approach in INR architecture. Specifically, we propose SL$2$A-INR, a hybrid network that combines a single-layer learnable activation function with an synthesis that uses traditional ReLU activations.
arXiv Detail & Related papers (2024-09-17T02:02:15Z)
RecursiveDet: End-to-End Region-based Recursive Object Detection [19.799892459080485]
Region-based object detectors like Sparse R-CNN usually have multiple cascade bounding box decoding stages. In this paper, we find the general setting of decoding stages is actually redundant. The RecusiveDet is able to achieve obvious performance boosts with even fewer model parameters.
arXiv Detail & Related papers (2023-07-25T16:22:58Z)
Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task. A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks. Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z)
Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation [102.25240608024063]
Referring image segments an image from a language expression. We develop an algorithm that shifts from being localization-centric to segmentation-language. Compared to its counterparts, our method is more versatile yet effective.
arXiv Detail & Related papers (2023-03-11T08:42:40Z)
A Faster, Lighter and Stronger Deep Learning-Based Approach for Place Recognition [7.9400442516053475]
We propose a faster, lighter and stronger approach that can generate models with fewer parameters and can spend less time in the inference stage. We design RepVGG-lite as the backbone network in our architecture, it is more discriminative than other general networks in the Place Recognition task. Our system has 14 times less params than Patch-NetVLAD, 6.8 times lower theoretical FLOPs, and run faster 21 and 33 times in feature extraction and feature matching.
arXiv Detail & Related papers (2022-11-27T15:46:53Z)
GhostSR: Learning Ghost Features for Efficient Image Super-Resolution [49.393251361038025]
Single image super-resolution (SISR) system based on convolutional neural networks (CNNs) achieves fancy performance while requires huge computational costs. We propose to use shift operation to generate the redundant features (i.e., Ghost features) of SISR models. We show that both the non-compact and lightweight SISR models embedded in our proposed module can achieve comparable performance to that of their baselines.
arXiv Detail & Related papers (2021-01-21T10:09:47Z)
Fine-Grained Dynamic Head for Object Detection [68.70628757217939]
We propose a fine-grained dynamic head to conditionally select a pixel-level combination of FPN features from different scales for each instance. Experiments demonstrate the effectiveness and efficiency of the proposed method on several state-of-the-art detection benchmarks.
arXiv Detail & Related papers (2020-12-07T08:16:32Z)
Lightweight Single-Image Super-Resolution Network with Attentive Auxiliary Feature Learning [73.75457731689858]
We develop a computation efficient yet accurate network based on the proposed attentive auxiliary features (A$2$F) for SISR. Experimental results on large-scale dataset demonstrate the effectiveness of the proposed model against the state-of-the-art (SOTA) SR methods.
arXiv Detail & Related papers (2020-11-13T06:01:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.