FineGym: A Hierarchical Video Dataset for Fine-grained Action
Understanding
- URL: http://arxiv.org/abs/2004.06704v1
- Date: Tue, 14 Apr 2020 17:55:21 GMT
- Title: FineGym: A Hierarchical Video Dataset for Fine-grained Action
Understanding
- Authors: Dian Shao, Yue Zhao, Bo Dai and Dahua Lin
- Abstract summary: FineGym is a new action recognition dataset built on top of gymnastic videos.
It provides temporal annotations at both action and sub-action levels with a three-level semantic hierarchy.
This new level of granularity presents significant challenges for action recognition.
- Score: 118.32912239230272
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: On public benchmarks, current action recognition techniques have achieved
great success. However, when used in real-world applications, e.g. sport
analysis, which requires the capability of parsing an activity into phases and
differentiating between subtly different actions, their performances remain far
from being satisfactory. To take action recognition to a new level, we develop
FineGym, a new dataset built on top of gymnastic videos. Compared to existing
action recognition datasets, FineGym is distinguished in richness, quality, and
diversity. In particular, it provides temporal annotations at both action and
sub-action levels with a three-level semantic hierarchy. For example, a
"balance beam" event will be annotated as a sequence of elementary sub-actions
derived from five sets: "leap-jump-hop", "beam-turns", "flight-salto",
"flight-handspring", and "dismount", where the sub-action in each set will be
further annotated with finely defined class labels. This new level of
granularity presents significant challenges for action recognition, e.g. how to
parse the temporal structures from a coherent action, and how to distinguish
between subtly different action classes. We systematically investigate
representative methods on this dataset and obtain a number of interesting
findings. We hope this dataset could advance research towards action
understanding.
Related papers
- Telling Stories for Common Sense Zero-Shot Action Recognition [11.166901260737786]
We introduce a novel dataset, Stories, which contains rich textual descriptions for diverse action classes extracted from WikiHow articles.
For each class, we extract multi-sentence narratives detailing the necessary steps, scenes, objects, and verbs that characterize the action.
This contextual data enables modeling of nuanced relationships between actions, paving the way for zero-shot transfer.
arXiv Detail & Related papers (2023-09-29T15:34:39Z) - Free-Form Composition Networks for Egocentric Action Recognition [97.02439848145359]
We propose a free-form composition network (FFCN) that can simultaneously learn disentangled verb, preposition, and noun representations.
The proposed FFCN can directly generate new training data samples for rare classes, hence significantly improve action recognition performance.
arXiv Detail & Related papers (2023-07-13T02:22:09Z) - Action Sensitivity Learning for Temporal Action Localization [35.65086250175736]
We propose an Action Sensitivity Learning framework (ASL) to tackle the task of temporal action localization.
We first introduce a lightweight Action Sensitivity Evaluator to learn the action sensitivity at the class level and instance level, respectively.
Based on the action sensitivity of each frame, we design an Action Sensitive Contrastive Loss to enhance features, where the action-aware frames are sampled as positive pairs to push away the action-irrelevant frames.
arXiv Detail & Related papers (2023-05-25T04:19:14Z) - Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with
Hierarchical Atomic Actions [13.665489987620724]
We tackle the problem of weakly-supervised fine-grained temporal action detection in videos for the first time.
We propose to model actions as the combinations of reusable atomic actions which are automatically discovered from data.
Our approach constructs a visual representation hierarchy of four levels: clip level, atomic action level, fine action class level and coarse action class level, with supervision at each level.
arXiv Detail & Related papers (2022-07-24T20:32:24Z) - FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality
Assessment [93.09267863425492]
We argue that understanding both high-level semantics and internal temporal structures of actions in competitive sports videos is the key to making predictions accurate and interpretable.
We construct a new fine-grained dataset, called FineDiving, developed on diverse diving events with detailed annotations on action procedures.
arXiv Detail & Related papers (2022-04-07T17:59:32Z) - Spatio-temporal Relation Modeling for Few-shot Action Recognition [100.3999454780478]
We propose a few-shot action recognition framework, STRM, which enhances class-specific featureriminability while simultaneously learning higher-order temporal representations.
Our approach achieves an absolute gain of 3.5% in classification accuracy, as compared to the best existing method in the literature.
arXiv Detail & Related papers (2021-12-09T18:59:14Z) - Dynamic Semantic Matching and Aggregation Network for Few-shot Intent
Detection [69.2370349274216]
Few-shot Intent Detection is challenging due to the scarcity of available annotated utterances.
Semantic components are distilled from utterances via multi-head self-attention.
Our method provides a comprehensive matching measure to enhance representations of both labeled and unlabeled instances.
arXiv Detail & Related papers (2020-10-06T05:16:38Z) - Intra- and Inter-Action Understanding via Temporal Action Parsing [118.32912239230272]
We construct a new dataset developed on sport videos with manual annotations of sub-actions, and conduct a study on temporal action parsing on top.
Our study shows that a sport activity usually consists of multiple sub-actions and that the awareness of such temporal structures is beneficial to action recognition.
We also investigate a number of temporal parsing methods, and thereon devise an improved method that is capable of mining sub-actions from training data without knowing the labels of them.
arXiv Detail & Related papers (2020-05-20T17:45:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.