Interpretable Long-term Action Quality Assessment
- URL: http://arxiv.org/abs/2408.11687v1
- Date: Wed, 21 Aug 2024 15:09:09 GMT
- Title: Interpretable Long-term Action Quality Assessment
- Authors: Xu Dong, Xinran Liu, Wanqing Li, Anthony Adeyemi-Ejeye, Andrew Gilbert,
- Abstract summary: Long-term Action Quality Assessment (AQA) evaluates the execution of activities in videos.
Current AQA methods produce a single score by averaging clip features.
Long-term videos pose additional difficulty due to the complexity and diversity of actions.
- Score: 12.343701556374556
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Long-term Action Quality Assessment (AQA) evaluates the execution of activities in videos. However, the length presents challenges in fine-grained interpretability, with current AQA methods typically producing a single score by averaging clip features, lacking detailed semantic meanings of individual clips. Long-term videos pose additional difficulty due to the complexity and diversity of actions, exacerbating interpretability challenges. While query-based transformer networks offer promising long-term modeling capabilities, their interpretability in AQA remains unsatisfactory due to a phenomenon we term Temporal Skipping, where the model skips self-attention layers to prevent output degradation. To address this, we propose an attention loss function and a query initialization method to enhance performance and interpretability. Additionally, we introduce a weight-score regression module designed to approximate the scoring patterns observed in human judgments and replace conventional single-score regression, improving the rationality of interpretability. Our approach achieves state-of-the-art results on three real-world, long-term AQA benchmarks. Our code is available at: https://github.com/dx199771/Interpretability-AQA
Related papers
- Zoom-IQA: Image Quality Assessment with Reliable Region-Aware Reasoning [32.30800226412995]
We introduce Zoom-IQA, a VLM-based IQA model to explicitly emulate key cognitive behaviors.<n>We show that Zoom-IQA achieves improved robustness, explainability, and generalization.<n>The application to downstream tasks, such as image restoration, further demonstrates the effectiveness of Zoom-IQA.
arXiv Detail & Related papers (2026-01-06T11:00:17Z) - CaFlow: Enhancing Long-Term Action Quality Assessment with Causal Counterfactual Flow [25.3923767595433]
Action Quality Assessment (AQA) predicts fine-grained execution scores from action videos.<n>Long-term AQA, as in figure skating or rhythmic gymnastics, is especially challenging since it requires modeling extended temporal dynamics.<n>We propose CaFlow, a unified framework that integrates counterfactual de-confounding with bidirectional time-conditioned flow.
arXiv Detail & Related papers (2025-11-26T18:25:41Z) - Are Large Reasoning Models Interruptible? [77.53059044071107]
Large Reasoning Models (LRMs) excel at complex reasoning but are traditionally evaluated in static, "frozen world" settings.<n>We show that even state-of-the-art LRMs, which achieve high accuracy in static settings, can fail unpredictably when interrupted or exposed to changing context.<n>Our analysis further reveals several novel failure modes, including reasoning leakage, panic, and self-doubt.
arXiv Detail & Related papers (2025-10-13T17:59:35Z) - Continual Action Quality Assessment via Adaptive Manifold-Aligned Graph Regularization [53.82400605816587]
Action Quality Assessment (AQA) quantifies human actions in videos, supporting applications in sports scoring, rehabilitation, and skill evaluation.<n>A major challenge lies in the non-stationary nature of quality distributions in real-world scenarios.<n>We introduce Continual AQA (CAQA), which equips with Continual Learning capabilities to handle evolving distributions.
arXiv Detail & Related papers (2025-10-08T10:09:47Z) - VQAThinker: Exploring Generalizable and Explainable Video Quality Assessment via Reinforcement Learning [50.34205095371895]
Video quality assessment aims to objectively quantify perceptual quality degradation.<n>Existing VQA models suffer from two critical limitations.<n>We propose textbfVQAThinker, a reasoning-based VQA framework.
arXiv Detail & Related papers (2025-08-08T06:16:23Z) - Inverse Scaling in Test-Time Compute [51.16323216811257]
Extending the reasoning length of Large Reasoning Models (LRMs) deteriorates performance.<n>We identify five distinct failure modes when models reason for longer.<n>These findings suggest that while test-time compute scaling remains promising for improving model capabilities, it may inadvertently reinforce problematic reasoning patterns.
arXiv Detail & Related papers (2025-07-19T00:06:13Z) - Online Convex Optimization and Integral Quadratic Constraints: A new approach to regret analysis [0.0]
We analyze dynamic regret of first-order constrained online convex optimization algorithms for strongly convex and Lipschitz-smooth objectives.
We derive a semi-definite program which, when feasible, provides a regret guarantee for the online algorithm.
arXiv Detail & Related papers (2025-03-30T21:48:11Z) - Video SimpleQA: Towards Factuality Evaluation in Large Video Language Models [69.68265487134686]
Video SimpleQA is the first comprehensive benchmark tailored for factuality evaluation of LVLMs.
Our work distinguishes from existing video benchmarks through the following key features.
Answers are crafted as unambiguous and definitively correct in a short format.
arXiv Detail & Related papers (2025-03-24T17:46:09Z) - Uncertainty Quantification in Retrieval Augmented Question Answering [57.05827081638329]
We propose to quantify the uncertainty of a QA model via estimating the utility of the passages it is provided with.
We train a lightweight neural model to predict passage utility for a target QA model and show that while simple information theoretic metrics can predict answer correctness up to a certain extent, our approach efficiently approximates or outperforms more expensive sampling-based methods.
arXiv Detail & Related papers (2025-02-25T11:24:52Z) - Action Quality Assessment via Hierarchical Pose-guided Multi-stage Contrastive Regression [25.657978409890973]
Action Assessment (AQA) aims at automatic and fair evaluation of athletic performance.
Current methods focus on segmenting video into fixed frames, which disrupts the temporal continuity of sub-actions.
We propose a novel action quality assessment method through hierarchically pose-guided multi-stage contrastive regression.
arXiv Detail & Related papers (2025-01-07T10:20:16Z) - Temporal Inconsistency Guidance for Super-resolution Video Quality Assessment [63.811519474030234]
We propose a perception-oriented approach to quantify frame-wise temporal inconsistency.<n>Inspired by the human visual system, we develop an Inconsistency Guided Temporal Module.<n>Our method significantly outperforms state-of-the-art VQA approaches.
arXiv Detail & Related papers (2024-12-25T15:43:41Z) - Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score.
Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score.
Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z) - Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model [86.9619638550683]
Vision-language foundation models have exhibited remarkable success across a multitude of downstream tasks due to their scalability on extensive image-text paired data.
However, these models display significant limitations when applied to downstream tasks, such as fine-grained image classification, as a result of decision shortcuts''
arXiv Detail & Related papers (2024-03-01T09:01:53Z) - Multi-Stage Contrastive Regression for Action Quality Assessment [31.763380011104015]
We propose a novel Multi-stage Contrastive Regression (MCoRe) framework for the action quality assessment (AQA) task.
Inspired by the graph contrastive learning, we propose a new stage-wise contrastive learning loss function to enhance performance.
MCoRe demonstrates the state-of-the-art result so far on the widely-adopted fine-grained AQA dataset.
arXiv Detail & Related papers (2024-01-05T14:48:19Z) - Sensitivity Analysis of RF+clust for Leave-one-problem-out Performance
Prediction [0.7046417074932257]
Left-one-problem-out (LOPO) performance prediction requires machine learning (ML) models to extrapolate algorithms' performance from a set of training problems to a previously unseen problem.
Recent work suggested enriching standard random forest (RF) performance regression models with a weighted average of algorithms' performance on training problems that are considered similar to a test problem.
Here, we extend the RF+clust approach by adjusting the distance-based weights with the importance of the features for performance regression.
arXiv Detail & Related papers (2023-05-30T19:31:31Z) - Uncertainty-Driven Action Quality Assessment [67.20617610820857]
We propose a novel probabilistic model, named Uncertainty-Driven AQA (UD-AQA), to capture the diversity among multiple judge scores.
We generate the estimation of uncertainty for each prediction, which is employed to re-weight AQA regression loss.
Our proposed method achieves competitive results on three benchmarks including the Olympic events MTL-AQA and FineDiving, and the surgical skill JIGSAWS datasets.
arXiv Detail & Related papers (2022-07-29T07:21:15Z) - Action Quality Assessment with Temporal Parsing Transformer [84.1272079121699]
Action Quality Assessment (AQA) is important for action understanding and resolving the task poses unique challenges due to subtle visual differences.
We propose a temporal parsing transformer to decompose the holistic feature into temporal part-level representations.
Our proposed method outperforms prior work on three public AQA benchmarks by a considerable margin.
arXiv Detail & Related papers (2022-07-19T13:29:05Z) - Stateful Offline Contextual Policy Evaluation and Learning [88.9134799076718]
We study off-policy evaluation and learning from sequential data.
We formalize the relevant causal structure of problems such as dynamic personalized pricing.
We show improved out-of-sample policy performance in this class of relevant problems.
arXiv Detail & Related papers (2021-10-19T16:15:56Z) - Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a
Class-imbalance View [129.392671317356]
We propose to interpret the language prior problem in VQA from a class-imbalance view.
It explicitly reveals why the VQA model tends to produce a frequent yet obviously wrong answer.
We also justify the validity of the class imbalance interpretation scheme on other computer vision tasks, such as face recognition and image classification.
arXiv Detail & Related papers (2020-10-30T00:57:17Z) - Norm-in-Norm Loss with Faster Convergence and Better Performance for
Image Quality Assessment [20.288424566444224]
We explore normalization in the design of loss functions for image quality assessment (IQA) models.
The resulting "Norm-in-Norm'' loss encourages the IQA model to make linear predictions with respect to subjective quality scores.
Experiments on two relevant datasets show that, compared to MAE or MSE loss, the new loss enables the IQA model to converge about 10 times faster.
arXiv Detail & Related papers (2020-08-10T04:01:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.