FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality
Assessment
- URL: http://arxiv.org/abs/2204.03646v1
- Date: Thu, 7 Apr 2022 17:59:32 GMT
- Title: FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality
Assessment
- Authors: Jinglin Xu, Yongming Rao, Xumin Yu, Guangyi Chen, Jie Zhou, Jiwen Lu
- Abstract summary: We argue that understanding both high-level semantics and internal temporal structures of actions in competitive sports videos is the key to making predictions accurate and interpretable.
We construct a new fine-grained dataset, called FineDiving, developed on diverse diving events with detailed annotations on action procedures.
- Score: 93.09267863425492
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most existing action quality assessment methods rely on the deep features of
an entire video to predict the score, which is less reliable due to the
non-transparent inference process and poor interpretability. We argue that
understanding both high-level semantics and internal temporal structures of
actions in competitive sports videos is the key to making predictions accurate
and interpretable. Towards this goal, we construct a new fine-grained dataset,
called FineDiving, developed on diverse diving events with detailed annotations
on action procedures. We also propose a procedure-aware approach for action
quality assessment, learned by a new Temporal Segmentation Attention module.
Specifically, we propose to parse pairwise query and exemplar action instances
into consecutive steps with diverse semantic and temporal correspondences. The
procedure-aware cross-attention is proposed to learn embeddings between query
and exemplar steps to discover their semantic, spatial, and temporal
correspondences, and further serve for fine-grained contrastive regression to
derive a reliable scoring mechanism. Extensive experiments demonstrate that our
approach achieves substantial improvements over state-of-the-art methods with
better interpretability. The dataset and code are available at
\url{https://github.com/xujinglin/FineDiving}.
Related papers
- SLAck: Semantic, Location, and Appearance Aware Open-Vocabulary Tracking [89.43370214059955]
Open-vocabulary Multiple Object Tracking (MOT) aims to generalize trackers to novel categories not in the training set.
We present a unified framework that jointly considers semantics, location, and appearance priors in the early steps of association.
Our method eliminates complex post-processings for fusing different cues and boosts the association performance significantly for large-scale open-vocabulary tracking.
arXiv Detail & Related papers (2024-09-17T14:36:58Z) - An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z) - FineParser: A Fine-grained Spatio-temporal Action Parser for Human-centric Action Quality Assessment [30.601466217201253]
Existing action quality assessment (AQA) methods mainly learn deep representations at the video level for scoring diverse actions.
Due to the lack of a fine-grained understanding of actions in videos, they harshly suffer from low credibility and interpretability, thus insufficient for stringent applications, such as Olympic diving events.
We argue that a fine-grained understanding of actions requires the model to perceive and parse actions in both time and space, which is also the key to the credibility and interpretability of the AQA technique.
arXiv Detail & Related papers (2024-05-11T02:57:16Z) - Advancing Relation Extraction through Language Probing with Exemplars
from Set Co-Expansion [1.450405446885067]
Relation Extraction (RE) is a pivotal task in automatically extracting structured information from unstructured text.
We present a multi-faceted approach that integrates representative examples and through co-set expansion.
Our method achieves an observed margin of at least 1 percent improvement in accuracy in most settings.
arXiv Detail & Related papers (2023-08-18T00:56:35Z) - Demystifying Unsupervised Semantic Correspondence Estimation [13.060538447838303]
We explore semantic correspondence estimation through the lens of unsupervised learning.
We thoroughly evaluate several recently proposed unsupervised methods across multiple challenging datasets.
We introduce a new unsupervised correspondence approach which utilizes the strength of pre-trained features while encouraging better matches during training.
arXiv Detail & Related papers (2022-07-11T17:59:51Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - A Positive/Unlabeled Approach for the Segmentation of Medical Sequences
using Point-Wise Supervision [3.883460584034766]
We propose a new method to efficiently segment medical imaging volumes or videos using point-wise annotations only.
Our approach trains a deep learning model using an appropriate Positive/Unlabeled objective function using point-wise annotations.
We show experimentally that our approach outperforms state-of-the-art methods tailored to the same problem.
arXiv Detail & Related papers (2021-07-18T09:13:33Z) - Intra- and Inter-Action Understanding via Temporal Action Parsing [118.32912239230272]
We construct a new dataset developed on sport videos with manual annotations of sub-actions, and conduct a study on temporal action parsing on top.
Our study shows that a sport activity usually consists of multiple sub-actions and that the awareness of such temporal structures is beneficial to action recognition.
We also investigate a number of temporal parsing methods, and thereon devise an improved method that is capable of mining sub-actions from training data without knowing the labels of them.
arXiv Detail & Related papers (2020-05-20T17:45:18Z) - Inferring Temporal Compositions of Actions Using Probabilistic Automata [61.09176771931052]
We propose to express temporal compositions of actions as semantic regular expressions and derive an inference framework using probabilistic automata.
Our approach is different from existing works that either predict long-range complex activities as unordered sets of atomic actions, or retrieve videos using natural language sentences.
arXiv Detail & Related papers (2020-04-28T00:15:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.