Skeleton-Split Framework using Spatial Temporal Graph Convolutional
Networks for Action Recogntion
- URL: http://arxiv.org/abs/2111.03106v1
- Date: Thu, 4 Nov 2021 18:59:02 GMT
- Title: Skeleton-Split Framework using Spatial Temporal Graph Convolutional
Networks for Action Recogntion
- Authors: Motasem Alsawadi and Miguel Rio
- Abstract summary: This work aims to recognize activities of daily living using the ST-GCN model.
We have achieved 48.88 % top-1 accuracy by using the connection split partitioning approach.
accuracy of 73.25 % top-1 is achieved by using the index split partitioning strategy.
- Score: 2.132096006921048
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There has been a dramatic increase in the volume of videos and their related
content uploaded to the internet. Accordingly, the need for efficient
algorithms to analyse this vast amount of data has attracted significant
research interest. An action recognition system based upon human body motions
has been proven to interpret videos contents accurately. This work aims to
recognize activities of daily living using the ST-GCN model, providing a
comparison between four different partitioning strategies: spatial
configuration partitioning, full distance split, connection split, and index
split. To achieve this aim, we present the first implementation of the ST-GCN
framework upon the HMDB-51 dataset. We have achieved 48.88 % top-1 accuracy by
using the connection split partitioning approach. Through experimental
simulation, we show that our proposals have achieved the highest accuracy
performance on the UCF-101 dataset using the ST-GCN framework than the
state-of-the-art approach. Finally, accuracy of 73.25 % top-1 is achieved by
using the index split partitioning strategy.
Related papers
- Lane Segmentation Refinement with Diffusion Models [4.292002248705256]
The lane graph is a key component for building high-definition (HD) maps and crucial for downstream tasks such as autonomous driving or navigation planning.
Previously, He et al. (2022) explored the extraction of the lane-level graph from aerial imagery utilizing a segmentation based approach.
We explore additional enhancements to refine this segmentation-based approach and extend it with a diffusion probabilistic model (DPM) component.
This combination further improves the GEO F1 and TOPO F1 scores, which are crucial indicators of the quality of a lane graph, in the undirected graph in non-intersection areas.
arXiv Detail & Related papers (2024-05-01T16:40:15Z) - Early Fusion of Features for Semantic Segmentation [10.362589129094975]
This paper introduces a novel segmentation framework that integrates a classifier network with a reverse HRNet architecture for efficient image segmentation.
Our methodology is rigorously tested across several benchmark datasets including Mapillary Vistas, Cityscapes, CamVid, COCO, and PASCAL-VOC2012.
The results demonstrate the effectiveness of our proposed model in achieving high segmentation accuracy, indicating its potential for various applications in image analysis.
arXiv Detail & Related papers (2024-02-08T22:58:06Z) - PSUMNet: Unified Modality Part Streams are All You Need for Efficient
Pose-based Action Recognition [10.340665633567081]
We introduce PSUMNet, a novel approach for scalable and efficient pose-based action recognition.
At the representation level, we propose a global frame based part stream approach as opposed to conventional modality based streams.
PSUMNet is highly efficient and outperforms competing methods which use 100%-400% more parameters.
arXiv Detail & Related papers (2022-08-11T12:12:07Z) - Real-Time Scene Text Detection with Differentiable Binarization and
Adaptive Scale Fusion [62.269219152425556]
segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field.
We propose a Differentiable Binarization (DB) module that integrates the binarization process into a segmentation network.
An efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively.
arXiv Detail & Related papers (2022-02-21T15:30:14Z) - Joint-bone Fusion Graph Convolutional Network for Semi-supervised
Skeleton Action Recognition [65.78703941973183]
We propose a novel correlation-driven joint-bone fusion graph convolutional network (CD-JBF-GCN) as an encoder and use a pose prediction head as a decoder.
Specifically, the CD-JBF-GC can explore the motion transmission between the joint stream and the bone stream.
The pose prediction based auto-encoder in the self-supervised training stage allows the network to learn motion representation from unlabeled data.
arXiv Detail & Related papers (2022-02-08T16:03:15Z) - HighlightMe: Detecting Highlights from Human-Centric Videos [52.84233165201391]
We present a domain- and user-preference-agnostic approach to detect highlightable excerpts from human-centric videos.
We use an autoencoder network equipped with spatial-temporal graph convolutions to detect human activities and interactions.
We observe a 4-12% improvement in the mean average precision of matching the human-annotated highlights over state-of-the-art methods.
arXiv Detail & Related papers (2021-10-05T01:18:15Z) - Skeleton Split Strategies for Spatial Temporal Graph Convolution
Networks [2.132096006921048]
A skeleton representation of the human body has been proven to be effective for this task.
A new set of methods to perform the convolution operation upon the skeleton graph is presented.
arXiv Detail & Related papers (2021-08-03T05:57:52Z) - Guided Interactive Video Object Segmentation Using Reliability-Based
Attention Maps [55.94785248905853]
We propose a novel guided interactive segmentation (GIS) algorithm for video objects to improve the segmentation accuracy and reduce the interaction time.
We develop the intersection-aware propagation module to propagate segmentation results to neighboring frames.
Experimental results demonstrate that the proposed algorithm provides more accurate segmentation results at a faster speed than conventional algorithms.
arXiv Detail & Related papers (2021-04-21T07:08:57Z) - Temporal Attention-Augmented Graph Convolutional Network for Efficient
Skeleton-Based Human Action Recognition [97.14064057840089]
Graphal networks (GCNs) have been very successful in modeling non-Euclidean data structures.
Most GCN-based action recognition methods use deep feed-forward networks with high computational complexity to process all skeletons in an action.
We propose a temporal attention module (TAM) for increasing the efficiency in skeleton-based action recognition.
arXiv Detail & Related papers (2020-10-23T08:01:55Z) - Finding Action Tubes with a Sparse-to-Dense Framework [62.60742627484788]
We propose a framework that generates action tube proposals from video streams with a single forward pass in a sparse-to-dense manner.
We evaluate the efficacy of our model on the UCF101-24, JHMDB-21 and UCFSports benchmark datasets.
arXiv Detail & Related papers (2020-08-30T15:38:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.