IRIS: Interpretable Rubric-Informed Segmentation for Action Quality
Assessment
- URL: http://arxiv.org/abs/2303.09097v1
- Date: Thu, 16 Mar 2023 06:01:21 GMT
- Title: IRIS: Interpretable Rubric-Informed Segmentation for Action Quality
Assessment
- Authors: Hitoshi Matsuyama, Nobuo Kawaguchi, Brian Y. Lim
- Abstract summary: Action Quality Assessment (AQA) of sports videos can mimic Olympic judges to help score performances as a second opinion or for training.
These AI methods are uninterpretable and do not justify their scores, which is important for algorithmic accountability.
We propose IRIS to perform Interpretable rubric-Informed on action sequences for AQA.
- Score: 7.929322038634728
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: AI-driven Action Quality Assessment (AQA) of sports videos can mimic Olympic
judges to help score performances as a second opinion or for training. However,
these AI methods are uninterpretable and do not justify their scores, which is
important for algorithmic accountability. Indeed, to account for their
decisions, instead of scoring subjectively, sports judges use a consistent set
of criteria - rubric - on multiple actions in each performance sequence.
Therefore, we propose IRIS to perform Interpretable Rubric-Informed
Segmentation on action sequences for AQA. We investigated IRIS for scoring
videos of figure skating performance. IRIS predicts (1) action segments, (2)
technical element score differences of each segment relative to base scores,
(3) multiple program component scores, and (4) the summed final score. In a
modeling study, we found that IRIS performs better than non-interpretable,
state-of-the-art models. In a formative user study, practicing figure skaters
agreed with the rubric-informed explanations, found them useful, and trusted AI
judgments more. This work highlights the importance of using judgment rubrics
to account for AI decisions.
Related papers
- Teach-to-Reason with Scoring: Self-Explainable Rationale-Driven Multi-Trait Essay Scoring [5.632624116225276]
Multi-trait automated essay scoring (AES) systems provide a fine-grained evaluation of an essay's diverse aspects.
Prior systems fail to explain why specific trait scores are assigned.
We propose a self-explainable Rationale-Driven Multi-trait automated Essay scoring framework.
arXiv Detail & Related papers (2025-02-28T05:54:23Z) - Validity Arguments For Constructed Response Scoring Using Generative Artificial Intelligence Applications [0.0]
generative AI is particularly appealing because it reduces the effort required for handcrafting features in traditional AI scoring.
We compare the validity evidence needed in scoring systems using human ratings, feature-based natural language processing AI scoring engines, and generative AI.
arXiv Detail & Related papers (2025-01-04T16:59:29Z) - CLAIR-A: Leveraging Large Language Models to Judge Audio Captions [73.51087998971418]
evaluating machine-generated audio captions is a complex task that requires considering diverse factors.
We propose CLAIR-A, a simple and flexible method that leverages the zero-shot capabilities of large language models.
In our evaluations, CLAIR-A better predicts human judgements of quality compared to traditional metrics.
arXiv Detail & Related papers (2024-09-19T17:59:52Z) - Classification Matters: Improving Video Action Detection with Class-Specific Attention [61.14469113965433]
Video action detection (VAD) aims to detect actors and classify their actions in a video.
We analyze how prevailing methods form features for classification and find that they prioritize actor regions.
We propose to reduce the bias toward actor and encourage paying attention to the context that is relevant to each action class.
arXiv Detail & Related papers (2024-07-29T04:43:58Z) - GAIA: Rethinking Action Quality Assessment for AI-Generated Videos [56.047773400426486]
Action quality assessment (AQA) algorithms predominantly focus on actions from real specific scenarios and are pre-trained with normative action features.
We construct GAIA, a Generic AI-generated Action dataset, by conducting a large-scale subjective evaluation from a novel causal reasoning-based perspective.
Results show that traditional AQA methods, action-related metrics in recent T2V benchmarks, and mainstream video quality methods perform poorly with an average SRCC of 0.454, 0.191, and 0.519, respectively.
arXiv Detail & Related papers (2024-06-10T08:18:07Z) - CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models [41.98394436858637]
We propose CompA, a collection of two expert-annotated benchmarks with a majority of real-world audio samples.
We first show that current ALMs perform only marginally better than random chance, thereby struggling with compositional reasoning.
Next, we propose CompA-CLAP, where we fine-tune CLAP using a novel learning method to improve its compositional reasoning abilities.
arXiv Detail & Related papers (2023-10-12T22:43:38Z) - SS-VAERR: Self-Supervised Apparent Emotional Reaction Recognition from
Video [61.21388780334379]
This work focuses on the apparent emotional reaction recognition from the video-only input, conducted in a self-supervised fashion.
The network is first pre-trained on different self-supervised pretext tasks and later fine-tuned on the downstream target task.
arXiv Detail & Related papers (2022-10-20T15:21:51Z) - Group-aware Contrastive Regression for Action Quality Assessment [85.43203180953076]
We show that the relations among videos can provide important clues for more accurate action quality assessment.
Our approach outperforms previous methods by a large margin and establishes new state-of-the-art on all three benchmarks.
arXiv Detail & Related papers (2021-08-17T17:59:39Z) - Towards Game-Playing AI Benchmarks via Performance Reporting Standards [0.9137554315375919]
We propose reporting guidelines for AI game-playing performance that, if followed, provide information suitable for unbiased comparisons between different AI approaches.
The vision we describe is to build benchmarks and competitions based on such guidelines in order to draw more general conclusions about the behaviour of different AI algorithms.
arXiv Detail & Related papers (2020-07-06T13:27:00Z) - Uncertainty-aware Score Distribution Learning for Action Quality
Assessment [91.05846506274881]
We propose an uncertainty-aware score distribution learning (USDL) approach for action quality assessment (AQA)
Specifically, we regard an action as an instance associated with a score distribution, which describes the probability of different evaluated scores.
Under the circumstance where fine-grained score labels are available, we devise a multi-path uncertainty-aware score distributions learning (MUSDL) method to explore the disentangled components of a score.
arXiv Detail & Related papers (2020-06-13T15:41:29Z) - Action Quality Assessment using Siamese Network-Based Deep Metric
Learning [7.945673227394573]
The proposed scoring model has been tested for Olympics Diving and Gymnastic vaults.
The model outperforms the existing state-of-the-art scoring models.
arXiv Detail & Related papers (2020-02-27T14:00:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.