Region-based Non-local Operation for Video Classification
- URL: http://arxiv.org/abs/2007.09033v5
- Date: Tue, 2 Feb 2021 00:21:37 GMT
- Title: Region-based Non-local Operation for Video Classification
- Authors: Guoxi Huang and Adrian G. Bors
- Abstract summary: This paper presents region-based non-local (RNL) operations as a family of self-attention mechanisms.
By combining a channel attention module with the proposed RNL, we design an attention chain, which can be integrated into the off-the-shelf CNNs for end-to-end training.
The experimental results of our method outperform other attention mechanisms, and we achieve state-of-the-art performance on the Something-Something V1 dataset.
- Score: 11.746833714322154
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional Neural Networks (CNNs) model long-range dependencies by deeply
stacking convolution operations with small window sizes, which makes the
optimizations difficult. This paper presents region-based non-local (RNL)
operations as a family of self-attention mechanisms, which can directly capture
long-range dependencies without using a deep stack of local operations. Given
an intermediate feature map, our method recalibrates the feature at a position
by aggregating the information from the neighboring regions of all positions.
By combining a channel attention module with the proposed RNL, we design an
attention chain, which can be integrated into the off-the-shelf CNNs for
end-to-end training. We evaluate our method on two video classification
benchmarks. The experimental results of our method outperform other attention
mechanisms, and we achieve state-of-the-art performance on the
Something-Something V1 dataset.
Related papers
- Single image super-resolution based on trainable feature matching attention network [0.0]
Convolutional Neural Networks (CNNs) have been widely employed for image Super-Resolution (SR)
We introduce Trainable Feature Matching (TFM) to amalgamate explicit feature learning into CNNs, augmenting their representation capabilities.
We also propose a streamlined variant called Same-size-divided Region-level Non-Local (SRNL) to alleviate the computational demands of non-local operations.
arXiv Detail & Related papers (2024-05-29T08:31:54Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - Direct Localization in Underwater Acoustics via Convolutional Neural
Networks: A Data-Driven Approach [31.399611901926583]
Direct localization (DLOC) methods generally outperform their indirect two-step counterparts.
Underwater acoustic DLOC methods require prior knowledge of the environment.
We propose what is to the best of our knowledge, the first data-driven DLOC method.
arXiv Detail & Related papers (2022-07-20T22:40:11Z) - Learning Implicit Feature Alignment Function for Semantic Segmentation [51.36809814890326]
Implicit Feature Alignment function (IFA) is inspired by the rapidly expanding topic of implicit neural representations.
We show that IFA implicitly aligns the feature maps at different levels and is capable of producing segmentation maps in arbitrary resolutions.
Our method can be combined with improvement on various architectures, and it achieves state-of-the-art accuracy trade-off on common benchmarks.
arXiv Detail & Related papers (2022-06-17T09:40:14Z) - Attention in Attention: Modeling Context Correlation for Efficient Video
Classification [47.938500236792244]
This paper proposes an efficient attention-in-attention (AIA) method for focus-wise feature refinement.
We instantiate video feature contexts as dynamics aggregated along a specific axis with global average and pooling operations.
All the computational operations in attention units act on the pooled dimension, which results in quite few computational cost increase.
arXiv Detail & Related papers (2022-04-20T08:37:52Z) - Local Augmentation for Graph Neural Networks [78.48812244668017]
We introduce the local augmentation, which enhances node features by its local subgraph structures.
Based on the local augmentation, we further design a novel framework: LA-GNN, which can apply to any GNN models in a plug-and-play manner.
arXiv Detail & Related papers (2021-09-08T18:10:08Z) - Global Aggregation then Local Distribution for Scene Parsing [99.1095068574454]
We show that our approach can be modularized as an end-to-end trainable block and easily plugged into existing semantic segmentation networks.
Our approach allows us to build new state of the art on major semantic segmentation benchmarks including Cityscapes, ADE20K, Pascal Context, Camvid and COCO-stuff.
arXiv Detail & Related papers (2021-07-28T03:46:57Z) - Clustered Federated Learning via Generalized Total Variation
Minimization [83.26141667853057]
We study optimization methods to train local (or personalized) models for local datasets with a decentralized network structure.
Our main conceptual contribution is to formulate federated learning as total variation minimization (GTV)
Our main algorithmic contribution is a fully decentralized federated learning algorithm.
arXiv Detail & Related papers (2021-05-26T18:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.