Multi-Stage Contrastive Regression for Action Quality Assessment
- URL: http://arxiv.org/abs/2401.02841v1
- Date: Fri, 5 Jan 2024 14:48:19 GMT
- Title: Multi-Stage Contrastive Regression for Action Quality Assessment
- Authors: Qi An, Mengshi Qi, Huadong Ma
- Abstract summary: We propose a novel Multi-stage Contrastive Regression (MCoRe) framework for the action quality assessment (AQA) task.
Inspired by the graph contrastive learning, we propose a new stage-wise contrastive learning loss function to enhance performance.
MCoRe demonstrates the state-of-the-art result so far on the widely-adopted fine-grained AQA dataset.
- Score: 31.763380011104015
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, there has been growing interest in the video-based action
quality assessment (AQA). Most existing methods typically solve AQA problem by
considering the entire video yet overlooking the inherent stage-level
characteristics of actions. To address this issue, we design a novel
Multi-stage Contrastive Regression (MCoRe) framework for the AQA task. This
approach allows us to efficiently extract spatial-temporal information, while
simultaneously reducing computational costs by segmenting the input video into
multiple stages or procedures. Inspired by the graph contrastive learning, we
propose a new stage-wise contrastive learning loss function to enhance
performance. As a result, MCoRe demonstrates the state-of-the-art result so far
on the widely-adopted fine-grained AQA dataset.
Related papers
- Empowering Large Language Model for Continual Video Question Answering with Collaborative Prompting [15.161997580529075]
This paper explores the novel challenge of VideoQA within a continual learning framework.
We propose Collaborative Prompting (ColPro), which integrates specific question constraint prompting, knowledge acquisition prompting, and visual temporal awareness prompting.
Experimental results on the NExT-QA and DramaQA datasets show that ColPro achieves superior performance compared to existing approaches.
arXiv Detail & Related papers (2024-10-01T15:07:07Z) - Interpretable Long-term Action Quality Assessment [12.343701556374556]
Long-term Action Quality Assessment (AQA) evaluates the execution of activities in videos.
Current AQA methods produce a single score by averaging clip features.
Long-term videos pose additional difficulty due to the complexity and diversity of actions.
arXiv Detail & Related papers (2024-08-21T15:09:09Z) - KaPQA: Knowledge-Augmented Product Question-Answering [59.096607961704656]
We introduce two product question-answering (QA) datasets focused on Adobe Acrobat and Photoshop products.
We also propose a novel knowledge-driven RAG-QA framework to enhance the performance of the models in the product QA task.
arXiv Detail & Related papers (2024-07-22T22:14:56Z) - GAIA: Rethinking Action Quality Assessment for AI-Generated Videos [56.047773400426486]
Action quality assessment (AQA) algorithms predominantly focus on actions from real specific scenarios and are pre-trained with normative action features.
We construct GAIA, a Generic AI-generated Action dataset, by conducting a large-scale subjective evaluation from a novel causal reasoning-based perspective.
Results show that traditional AQA methods, action-related metrics in recent T2V benchmarks, and mainstream video quality methods perform poorly with an average SRCC of 0.454, 0.191, and 0.519, respectively.
arXiv Detail & Related papers (2024-06-10T08:18:07Z) - Continual Action Assessment via Task-Consistent Score-Discriminative Feature Distribution Modeling [31.696222064667243]
Action Quality Assessment (AQA) is a task that tries to answer how well an action is carried out.
Existing works on AQA assume that all the training data are visible for training at one time, but do not enable continual learning.
We propose a unified model to learn AQA tasks sequentially without forgetting.
arXiv Detail & Related papers (2023-09-29T10:06:28Z) - Solving Continuous Control via Q-learning [54.05120662838286]
We show that a simple modification of deep Q-learning largely alleviates issues with actor-critic methods.
By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods.
arXiv Detail & Related papers (2022-10-22T22:55:50Z) - Action Quality Assessment with Temporal Parsing Transformer [84.1272079121699]
Action Quality Assessment (AQA) is important for action understanding and resolving the task poses unique challenges due to subtle visual differences.
We propose a temporal parsing transformer to decompose the holistic feature into temporal part-level representations.
Our proposed method outperforms prior work on three public AQA benchmarks by a considerable margin.
arXiv Detail & Related papers (2022-07-19T13:29:05Z) - Auto-Encoding Score Distribution Regression for Action Quality
Assessment [41.45638722765149]
Action quality assessment (AQA) from videos is a challenging vision task.
Traditionally, AQA task is treated as a regression problem to learn the underlying mappings between videos and action scores.
We develop Distribution Auto-Encoder (DAE) to address the above problems.
arXiv Detail & Related papers (2021-11-22T07:30:04Z) - Group-aware Contrastive Regression for Action Quality Assessment [85.43203180953076]
We show that the relations among videos can provide important clues for more accurate action quality assessment.
Our approach outperforms previous methods by a large margin and establishes new state-of-the-art on all three benchmarks.
arXiv Detail & Related papers (2021-08-17T17:59:39Z) - Temporal Context Aggregation for Video Retrieval with Contrastive
Learning [81.12514007044456]
We propose TCA, a video representation learning framework that incorporates long-range temporal information between frame-level features.
The proposed method shows a significant performance advantage (17% mAP on FIVR-200K) over state-of-the-art methods with video-level features.
arXiv Detail & Related papers (2020-08-04T05:24:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.