PNL: Efficient Long-Range Dependencies Extraction with Pyramid Non-Local
Module for Action Recognition
- URL: http://arxiv.org/abs/2006.05091v1
- Date: Tue, 9 Jun 2020 07:40:23 GMT
- Title: PNL: Efficient Long-Range Dependencies Extraction with Pyramid Non-Local
Module for Action Recognition
- Authors: Yuecong Xu, Haozhi Cao, Jianfei Yang, Kezhi Mao, Jianxiong Yin and
Simon See
- Abstract summary: Non-local block inspired by the non-local means is designed to address this challenge.
Non-local block brings significant increase in computation cost to the original network.
It also lacks the ability to model regional correlation in videos.
We propose Pyramid Non-Local (PNL) module, which extends the non-local block by incorporating regional correlation at structured pyramid module.
- Score: 19.010874017607247
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Long-range spatiotemporal dependencies capturing plays an essential role in
improving video features for action recognition. The non-local block inspired
by the non-local means is designed to address this challenge and have shown
excellent performance. However, the non-local block brings significant increase
in computation cost to the original network. It also lacks the ability to model
regional correlation in videos. To address the above limitations, we propose
Pyramid Non-Local (PNL) module, which extends the non-local block by
incorporating regional correlation at multiple scales through a pyramid
structured module. This extension upscales the effectiveness of non-local
operation by attending to the interaction between different regions. Empirical
results prove the effectiveness and efficiency of our PNL module, which
achieves state-of-the-art performance of 83.09% on the Mini-Kinetics dataset,
with decreased computation cost compared to the non-local block.
Related papers
- Temporal Action Localization with Enhanced Instant Discriminability [66.76095239972094]
Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video.
We propose a one-stage framework named TriDet to resolve imprecise predictions of action boundaries by existing methods.
Experimental results demonstrate the robustness of TriDet and its state-of-the-art performance on multiple TAD datasets.
arXiv Detail & Related papers (2023-09-11T16:17:50Z) - Region-Enhanced Feature Learning for Scene Semantic Segmentation [19.20735517821943]
We propose using regions as the intermediate representation of point clouds instead of fine-grained points or voxels to reduce the computational burden.
We design a region-based feature enhancement (RFE) module, which consists of a Semantic-Spatial Region Extraction stage and a Region Dependency Modeling stage.
Our REFL-Net achieves 1.8% mIoU gain on ScanNetV2 and 1.7% mIoU gain on S3DIS datasets with negligible computational cost.
arXiv Detail & Related papers (2023-04-15T06:35:06Z) - FedSpeed: Larger Local Interval, Less Communication Round, and Higher
Generalization Accuracy [84.45004766136663]
Federated learning is an emerging distributed machine learning framework.
It suffers from the non-vanishing biases introduced by the local inconsistent optimal and the rugged client-drifts by the local over-fitting.
We propose a novel and practical method, FedSpeed, to alleviate the negative impacts posed by these problems.
arXiv Detail & Related papers (2023-02-21T03:55:29Z) - Efficient Non-Local Contrastive Attention for Image Super-Resolution [48.093500219958834]
Non-Local Attention (NLA) brings significant improvement for Single Image Super-Resolution (SISR) by leveraging intrinsic feature correlation in natural images.
We propose a novel Efficient Non-Local Contrastive Attention (ENLCA) to perform long-range visual modeling and leverage more relevant non-local features.
arXiv Detail & Related papers (2022-01-11T05:59:09Z) - Denoised Non-Local Neural Network for Semantic Segmentation [18.84185406522064]
We propose a Denoised Non-Local Network (Denoised NL) to eliminate the inter-class and intra-class noises respectively.
Our proposed NL can achieve the state-of-the-art performance of 83.5% and 46.69% mIoU on Cityscapes and ADE20K, respectively.
arXiv Detail & Related papers (2021-10-27T06:16:31Z) - Poly-NL: Linear Complexity Non-local Layers with Polynomials [76.21832434001759]
We formulate novel fast NonLocal blocks, capable of reducing complexity from quadratic to linear with no loss in performance.
The proposed method, which we dub as "Poly-NL", is competitive with state-of-the-art performance across image recognition, instance segmentation, and face detection tasks.
arXiv Detail & Related papers (2021-07-06T19:51:37Z) - Feature Completion for Occluded Person Re-Identification [138.5671859358049]
RFC block can recover semantics of occluded regions in feature space.
SRFC exploits the long-range spatial contexts from non-occluded regions to predict the features of occluded regions.
TRFC module captures the long-term temporal contexts to refine the prediction of SRFC.
arXiv Detail & Related papers (2021-06-24T02:40:40Z) - Speaker Representation Learning using Global Context Guided Channel and
Time-Frequency Transformations [67.18006078950337]
We use the global context information to enhance important channels and recalibrate salient time-frequency locations.
The proposed modules, together with a popular ResNet based model, are evaluated on the VoxCeleb1 dataset.
arXiv Detail & Related papers (2020-09-02T01:07:29Z) - Region-based Non-local Operation for Video Classification [11.746833714322154]
This paper presents region-based non-local (RNL) operations as a family of self-attention mechanisms.
By combining a channel attention module with the proposed RNL, we design an attention chain, which can be integrated into the off-the-shelf CNNs for end-to-end training.
The experimental results of our method outperform other attention mechanisms, and we achieve state-of-the-art performance on the Something-Something V1 dataset.
arXiv Detail & Related papers (2020-07-17T14:57:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.