Multi-Stage Contrastive Regression for Action Quality Assessment
- URL: http://arxiv.org/abs/2401.02841v1
- Date: Fri, 5 Jan 2024 14:48:19 GMT
- Title: Multi-Stage Contrastive Regression for Action Quality Assessment
- Authors: Qi An, Mengshi Qi, Huadong Ma
- Abstract summary: We propose a novel Multi-stage Contrastive Regression (MCoRe) framework for the action quality assessment (AQA) task.
Inspired by the graph contrastive learning, we propose a new stage-wise contrastive learning loss function to enhance performance.
MCoRe demonstrates the state-of-the-art result so far on the widely-adopted fine-grained AQA dataset.
- Score: 31.763380011104015
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, there has been growing interest in the video-based action
quality assessment (AQA). Most existing methods typically solve AQA problem by
considering the entire video yet overlooking the inherent stage-level
characteristics of actions. To address this issue, we design a novel
Multi-stage Contrastive Regression (MCoRe) framework for the AQA task. This
approach allows us to efficiently extract spatial-temporal information, while
simultaneously reducing computational costs by segmenting the input video into
multiple stages or procedures. Inspired by the graph contrastive learning, we
propose a new stage-wise contrastive learning loss function to enhance
performance. As a result, MCoRe demonstrates the state-of-the-art result so far
on the widely-adopted fine-grained AQA dataset.
Related papers
- ReasVQA: Advancing VideoQA with Imperfect Reasoning Process [38.4638171723351]
textbfReasVQA (Reasoning-enhanced Video Question Answering) is a novel approach that leverages reasoning processes generated by Multimodal Large Language Models (MLLMs) to improve the performance of VideoQA models.
We evaluate ReasVQA on three popular benchmarks, and our results establish new state-of-the-art performance with significant improvements of +2.9 on NExT-QA, +7.3 on STAR, and +5.9 on IntentQA.
arXiv Detail & Related papers (2025-01-23T10:35:22Z) - Action Quality Assessment via Hierarchical Pose-guided Multi-stage Contrastive Regression [25.657978409890973]
Action Assessment (AQA) aims at automatic and fair evaluation of athletic performance.
Current methods focus on segmenting video into fixed frames, which disrupts the temporal continuity of sub-actions.
We propose a novel action quality assessment method through hierarchically pose-guided multi-stage contrastive regression.
arXiv Detail & Related papers (2025-01-07T10:20:16Z) - Interpretable Long-term Action Quality Assessment [12.343701556374556]
Long-term Action Quality Assessment (AQA) evaluates the execution of activities in videos.
Current AQA methods produce a single score by averaging clip features.
Long-term videos pose additional difficulty due to the complexity and diversity of actions.
arXiv Detail & Related papers (2024-08-21T15:09:09Z) - KaPQA: Knowledge-Augmented Product Question-Answering [59.096607961704656]
We introduce two product question-answering (QA) datasets focused on Adobe Acrobat and Photoshop products.
We also propose a novel knowledge-driven RAG-QA framework to enhance the performance of the models in the product QA task.
arXiv Detail & Related papers (2024-07-22T22:14:56Z) - GAIA: Rethinking Action Quality Assessment for AI-Generated Videos [56.047773400426486]
Action quality assessment (AQA) algorithms predominantly focus on actions from real specific scenarios and are pre-trained with normative action features.
We construct GAIA, a Generic AI-generated Action dataset, by conducting a large-scale subjective evaluation from a novel causal reasoning-based perspective.
Results show that traditional AQA methods, action-related metrics in recent T2V benchmarks, and mainstream video quality methods perform poorly with an average SRCC of 0.454, 0.191, and 0.519, respectively.
arXiv Detail & Related papers (2024-06-10T08:18:07Z) - Continual Action Assessment via Task-Consistent Score-Discriminative Feature Distribution Modeling [31.696222064667243]
Action Quality Assessment (AQA) is a task that tries to answer how well an action is carried out.
Existing works on AQA assume that all the training data are visible for training at one time, but do not enable continual learning.
We propose a unified model to learn AQA tasks sequentially without forgetting.
arXiv Detail & Related papers (2023-09-29T10:06:28Z) - Solving Continuous Control via Q-learning [54.05120662838286]
We show that a simple modification of deep Q-learning largely alleviates issues with actor-critic methods.
By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods.
arXiv Detail & Related papers (2022-10-22T22:55:50Z) - Action Quality Assessment with Temporal Parsing Transformer [84.1272079121699]
Action Quality Assessment (AQA) is important for action understanding and resolving the task poses unique challenges due to subtle visual differences.
We propose a temporal parsing transformer to decompose the holistic feature into temporal part-level representations.
Our proposed method outperforms prior work on three public AQA benchmarks by a considerable margin.
arXiv Detail & Related papers (2022-07-19T13:29:05Z) - Group-aware Contrastive Regression for Action Quality Assessment [85.43203180953076]
We show that the relations among videos can provide important clues for more accurate action quality assessment.
Our approach outperforms previous methods by a large margin and establishes new state-of-the-art on all three benchmarks.
arXiv Detail & Related papers (2021-08-17T17:59:39Z) - Temporal Context Aggregation for Video Retrieval with Contrastive
Learning [81.12514007044456]
We propose TCA, a video representation learning framework that incorporates long-range temporal information between frame-level features.
The proposed method shows a significant performance advantage (17% mAP on FIVR-200K) over state-of-the-art methods with video-level features.
arXiv Detail & Related papers (2020-08-04T05:24:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.