GMFL-Net: A Global Multi-geometric Feature Learning Network for Repetitive Action Counting
- URL: http://arxiv.org/abs/2409.00330v1
- Date: Sat, 31 Aug 2024 02:18:26 GMT
- Title: GMFL-Net: A Global Multi-geometric Feature Learning Network for Repetitive Action Counting
- Authors: Jun Li, Jinying Wu, Qiming Li, Feifei Guo,
- Abstract summary: We propose a simple but efficient Global Multi-geometric Feature Learning Network (GMFL-Net)
Specifically, we design a MIA-Module that aims to improve information representation by fusing multi-geometric features.
We also design a GBFL-Module that enhances the inter-dependencies between point-wise and channel-wise elements.
- Score: 4.117416395116726
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the continuous development of deep learning, the field of repetitive action counting is gradually gaining notice from many researchers. Extraction of pose keypoints using human pose estimation networks is proven to be an effective pose-level method. However, existing pose-level methods suffer from the shortcomings that the single coordinate is not stable enough to handle action distortions due to changes in camera viewpoints, thus failing to accurately identify salient poses, and is vulnerable to misdetection during the transition from the exception to the actual action. To overcome these problems, we propose a simple but efficient Global Multi-geometric Feature Learning Network (GMFL-Net). Specifically, we design a MIA-Module that aims to improve information representation by fusing multi-geometric features, and learning the semantic similarity among the input multi-geometric features. Then, to improve the feature representation from a global perspective, we also design a GBFL-Module that enhances the inter-dependencies between point-wise and channel-wise elements and combines them with the rich local information generated by the MIA-Module to synthesise a comprehensive and most representative global feature representation. In addition, considering the insufficient existing dataset, we collect a new dataset called Countix-Fitness-pose (https://github.com/Wantong66/Countix-Fitness) which contains different cycle lengths and exceptions, a test set with longer duration, and annotate it with fine-grained annotations at the pose-level. We also add two new action classes, namely lunge and rope push-down. Finally, extensive experiments on the challenging RepCount-pose, UCFRep-pose, and Countix-Fitness-pose benchmarks show that our proposed GMFL-Net achieves state-of-the-art performance.
Related papers
- RFL-CDNet: Towards Accurate Change Detection via Richer Feature Learning [39.3740222598949]
RFL-CDNet is a novel framework that utilizes richer feature learning to boost change detection performance.
C2FG module aims to seamlessly integrate the side prediction from the previous coarse-scale into the current fine-scale prediction.
LF module assumes that the contribution of each stage and each spatial location is independent, thus designing a learnable module to fuse multiple predictions.
arXiv Detail & Related papers (2024-04-27T03:07:07Z) - Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - Global Relation Modeling and Refinement for Bottom-Up Human Pose
Estimation [4.24515544235173]
We propose a convolutional neural network for bottom-up human pose estimation.
Our model has the ability to focus on different granularity from local to global regions.
Our results on the COCO and CrowdPose datasets demonstrate that it is an efficient framework for multi-person pose estimation.
arXiv Detail & Related papers (2023-03-27T02:54:08Z) - USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text
Retrieval [115.28586222748478]
Image-Text Retrieval (ITR) aims at searching for the target instances that are semantically relevant to the given query from the other modality.
Existing approaches typically suffer from two major limitations.
arXiv Detail & Related papers (2023-01-17T12:42:58Z) - BIMS-PU: Bi-Directional and Multi-Scale Point Cloud Upsampling [60.257912103351394]
We develop a new point cloud upsampling pipeline called BIMS-PU.
We decompose the up/downsampling procedure into several up/downsampling sub-steps by breaking the target sampling factor into smaller factors.
We show that our method achieves superior results to state-of-the-art approaches.
arXiv Detail & Related papers (2022-06-25T13:13:37Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z) - DFNet: Discriminative feature extraction and integration network for
salient object detection [6.959742268104327]
We focus on two aspects of challenges in saliency detection using Convolutional Neural Networks.
Firstly, since salient objects appear in various sizes, using single-scale convolution would not capture the right size.
Secondly, using multi-level features helps the model use both local and global context.
arXiv Detail & Related papers (2020-04-03T13:56:41Z) - Crowd Counting via Hierarchical Scale Recalibration Network [61.09833400167511]
We propose a novel Hierarchical Scale Recalibration Network (HSRNet) to tackle the task of crowd counting.
HSRNet models rich contextual dependencies and recalibrating multiple scale-associated information.
Our approach can ignore various noises selectively and focus on appropriate crowd scales automatically.
arXiv Detail & Related papers (2020-03-07T10:06:47Z) - Global Context-Aware Progressive Aggregation Network for Salient Object
Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features.
We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.