A Baseline Framework for Part-level Action Parsing and Action
Recognition
- URL: http://arxiv.org/abs/2110.03368v1
- Date: Thu, 7 Oct 2021 12:04:59 GMT
- Title: A Baseline Framework for Part-level Action Parsing and Action
Recognition
- Authors: Xiaodong Chen, Xinchen Liu, Kun Liu, Wu Liu, Tao Mei
- Abstract summary: This report introduces our 2nd place solution to Kinetics-TPS Track on Part-level Action Parsing in ICCV DeeperAction Workshop 2021.
Our entry is mainly based on YOLOF for instance and part detection, HRNet for human pose estimation, and CSN for video-level action recognition and frame-level part state parsing.
In the competition, we achieved 61.37% mAP on the test set of Kinetics-TPS.
- Score: 67.38737952295504
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This technical report introduces our 2nd place solution to Kinetics-TPS Track
on Part-level Action Parsing in ICCV DeeperAction Workshop 2021. Our entry is
mainly based on YOLOF for instance and part detection, HRNet for human pose
estimation, and CSN for video-level action recognition and frame-level part
state parsing. We describe technical details for the Kinetics-TPS dataset,
together with some experimental results. In the competition, we achieved 61.37%
mAP on the test set of Kinetics-TPS.
Related papers
- 3rd Place Solution for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation [63.199793919573295]
Video Object (VOS) is a vital task in computer vision, focusing on distinguishing foreground objects from the background across video frames.
Our work draws inspiration from the Cutie model, and we investigate the effects of object memory, the total number of memory frames, and input resolution on segmentation performance.
arXiv Detail & Related papers (2024-06-06T00:56:25Z) - USTHB at NADI 2023 shared task: Exploring Preprocessing and Feature
Engineering Strategies for Arabic Dialect Identification [0.0]
We investigate the effects of surface preprocessing, morphological preprocessing, FastText vector model, and the weighted concatenation of TF-IDF features.
During the evaluation phase, our system demonstrates noteworthy results, achieving an F1 score of 62.51%.
arXiv Detail & Related papers (2023-12-16T20:23:53Z) - Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and
Motion Estimation [49.56131393810713]
We present an SE(3) equivariant architecture and a training strategy to tackle this task in an unsupervised manner.
Our method excels in both model performance and computational efficiency, with only 0.25M parameters and 0.92G FLOPs.
arXiv Detail & Related papers (2023-06-08T22:55:32Z) - Part-level Action Parsing via a Pose-guided Coarse-to-Fine Framework [108.70949305791201]
Part-level Action Parsing (PAP) aims to not only predict the video-level action but also recognize the frame-level fine-grained actions or interactions of body parts for each person in the video.
In particular, our framework first predicts the video-level class of the input video, then localizes the body parts and predicts the part-level action.
Our framework achieves state-of-the-art performance and outperforms existing methods over a 31.10% ROC score.
arXiv Detail & Related papers (2022-03-09T01:30:57Z) - Technical Report: Disentangled Action Parsing Networks for Accurate
Part-level Action Parsing [65.87931036949458]
Part-level Action Parsing aims at part state parsing for boosting action recognition in videos.
We present a simple yet effective approach, named disentangled action parsing (DAP)
arXiv Detail & Related papers (2021-11-05T02:29:32Z) - Skeleton-Split Framework using Spatial Temporal Graph Convolutional
Networks for Action Recogntion [2.132096006921048]
This work aims to recognize activities of daily living using the ST-GCN model.
We have achieved 48.88 % top-1 accuracy by using the connection split partitioning approach.
accuracy of 73.25 % top-1 is achieved by using the index split partitioning strategy.
arXiv Detail & Related papers (2021-11-04T18:59:02Z) - Part-aware Panoptic Segmentation [3.342126234995932]
Part-aware Panoptic (PPS) aims to understand a scene at multiple levels of abstraction.
We provide consistent annotations on two commonly used datasets: Cityscapes and Pascal VOC.
We present a single metric to evaluate PPS, called Part-aware Panoptic Quality (PartPQ)
arXiv Detail & Related papers (2021-06-11T12:48:07Z) - Modeling long-term interactions to enhance action recognition [81.09859029964323]
We propose a new approach to under-stand actions in egocentric videos that exploits the semantics of object interactions at both frame and temporal levels.
We use a region-based approach that takes as input a primary region roughly corresponding to the user hands and a set of secondary regions potentially corresponding to the interacting objects.
The proposed approach outperforms the state-of-the-art in terms of action recognition on standard benchmarks.
arXiv Detail & Related papers (2021-04-23T10:08:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.