Action Quality Assessment via Hierarchical Pose-guided Multi-stage Contrastive Regression
- URL: http://arxiv.org/abs/2501.03674v1
- Date: Tue, 07 Jan 2025 10:20:16 GMT
- Title: Action Quality Assessment via Hierarchical Pose-guided Multi-stage Contrastive Regression
- Authors: Mengshi Qi, Hao Ye, Jiaxuan Peng, Huadong Ma,
- Abstract summary: Action Assessment (AQA) aims at automatic and fair evaluation of athletic performance.
Current methods focus on segmenting video into fixed frames, which disrupts the temporal continuity of sub-actions.
We propose a novel action quality assessment method through hierarchically pose-guided multi-stage contrastive regression.
- Score: 25.657978409890973
- License:
- Abstract: Action Quality Assessment (AQA), which aims at automatic and fair evaluation of athletic performance, has gained increasing attention in recent years. However, athletes are often in rapid movement and the corresponding visual appearance variances are subtle, making it challenging to capture fine-grained pose differences and leading to poor estimation performance. Furthermore, most common AQA tasks, such as diving in sports, are usually divided into multiple sub-actions, each of which contains different durations. However, existing methods focus on segmenting the video into fixed frames, which disrupts the temporal continuity of sub-actions resulting in unavoidable prediction errors. To address these challenges, we propose a novel action quality assessment method through hierarchically pose-guided multi-stage contrastive regression. Firstly, we introduce a multi-scale dynamic visual-skeleton encoder to capture fine-grained spatio-temporal visual and skeletal features. Then, a procedure segmentation network is introduced to separate different sub-actions and obtain segmented features. Afterwards, the segmented visual and skeletal features are both fed into a multi-modal fusion module as physics structural priors, to guide the model in learning refined activity similarities and variances. Finally, a multi-stage contrastive learning regression approach is employed to learn discriminative representations and output prediction results. In addition, we introduce a newly-annotated FineDiving-Pose Dataset to improve the current low-quality human pose labels. In experiments, the results on FineDiving and MTL-AQA datasets demonstrate the effectiveness and superiority of our proposed approach. Our source code and dataset are available at https://github.com/Lumos0507/HP-MCoRe.
Related papers
- An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z) - Multi-Stage Contrastive Regression for Action Quality Assessment [31.763380011104015]
We propose a novel Multi-stage Contrastive Regression (MCoRe) framework for the action quality assessment (AQA) task.
Inspired by the graph contrastive learning, we propose a new stage-wise contrastive learning loss function to enhance performance.
MCoRe demonstrates the state-of-the-art result so far on the widely-adopted fine-grained AQA dataset.
arXiv Detail & Related papers (2024-01-05T14:48:19Z) - Locality-aware Attention Network with Discriminative Dynamics Learning
for Weakly Supervised Anomaly Detection [0.8883733362171035]
We propose a Discriminative Dynamics Learning (DDL) method with two objective functions, i.e., dynamics ranking loss and dynamics alignment loss.
A Locality-aware Attention Network (LA-Net) is constructed to capture global correlations and re-calibrate the location preference across snippets, followed by a multilayer perceptron with causal convolution to obtain anomaly scores.
arXiv Detail & Related papers (2022-08-11T04:27:33Z) - FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality
Assessment [93.09267863425492]
We argue that understanding both high-level semantics and internal temporal structures of actions in competitive sports videos is the key to making predictions accurate and interpretable.
We construct a new fine-grained dataset, called FineDiving, developed on diverse diving events with detailed annotations on action procedures.
arXiv Detail & Related papers (2022-04-07T17:59:32Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - Self-Regulated Learning for Egocentric Video Activity Anticipation [147.9783215348252]
Self-Regulated Learning (SRL) aims to regulate the intermediate representation consecutively to produce representation that emphasizes the novel information in the frame of the current time-stamp.
SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets.
arXiv Detail & Related papers (2021-11-23T03:29:18Z) - Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input.
We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture.
The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z) - Few-shot Action Recognition with Prototype-centered Attentive Learning [88.10852114988829]
Prototype-centered Attentive Learning (PAL) model composed of two novel components.
First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective.
Second, PAL integrates a attentive hybrid learning mechanism that can minimize the negative impacts of outliers.
arXiv Detail & Related papers (2021-01-20T11:48:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.