Leveraging Self-Supervised Training for Unintentional Action Recognition
- URL: http://arxiv.org/abs/2209.11870v1
- Date: Fri, 23 Sep 2022 21:36:36 GMT
- Title: Leveraging Self-Supervised Training for Unintentional Action Recognition
- Authors: Enea Duka, Anna Kukleva, Bernt Schiele
- Abstract summary: We seek to identify the points in videos where the actions transition from intentional to unintentional.
We propose a multi-stage framework that exploits inherent biases such as motion speed, motion direction, and order to recognize unintentional actions.
- Score: 82.19777933440143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unintentional actions are rare occurrences that are difficult to define
precisely and that are highly dependent on the temporal context of the action.
In this work, we explore such actions and seek to identify the points in videos
where the actions transition from intentional to unintentional. We propose a
multi-stage framework that exploits inherent biases such as motion speed,
motion direction, and order to recognize unintentional actions. To enhance
representations via self-supervised training for the task of unintentional
action recognition we propose temporal transformations, called Temporal
Transformations of Inherent Biases of Unintentional Actions (T2IBUA). The
multi-stage approach models the temporal information on both the level of
individual frames and full clips. These enhanced representations show strong
performance for unintentional action recognition tasks. We provide an extensive
ablation study of our framework and report results that significantly improve
over the state-of-the-art.
Related papers
- No More Shortcuts: Realizing the Potential of Temporal Self-Supervision [69.59938105887538]
We propose a more challenging reformulation of temporal self-supervision as frame-level (rather than clip-level) recognition tasks.
We demonstrate experimentally that our more challenging frame-level task formulations and the removal of shortcuts drastically improve the quality of features learned through temporal self-supervision.
arXiv Detail & Related papers (2023-12-20T13:20:31Z) - Action Sensitivity Learning for Temporal Action Localization [35.65086250175736]
We propose an Action Sensitivity Learning framework (ASL) to tackle the task of temporal action localization.
We first introduce a lightweight Action Sensitivity Evaluator to learn the action sensitivity at the class level and instance level, respectively.
Based on the action sensitivity of each frame, we design an Action Sensitive Contrastive Loss to enhance features, where the action-aware frames are sampled as positive pairs to push away the action-irrelevant frames.
arXiv Detail & Related papers (2023-05-25T04:19:14Z) - DIR-AS: Decoupling Individual Identification and Temporal Reasoning for
Action Segmentation [84.78383981697377]
Fully supervised action segmentation works on frame-wise action recognition with dense annotations and often suffers from the over-segmentation issue.
We develop a novel local-global attention mechanism with temporal pyramid dilation and temporal pyramid pooling for efficient multi-scale attention.
We achieve state-of-the-art accuracy, eg, 82.8% (+2.6%) on GTEA and 74.7% (+1.2%) on Breakfast, which demonstrates the effectiveness of our proposed method.
arXiv Detail & Related papers (2023-04-04T20:27:18Z) - Tragedy Plus Time: Capturing Unintended Human Activities from
Weakly-labeled Videos [31.1632730473261]
W-Oops consists of 2,100 unintentional human action videos, with 44 goal-directed and 30 unintentional video-level activity labels collected through human annotations.
We propose a weakly supervised algorithm for localizing the goal-directed as well as unintentional temporal regions in the video.
arXiv Detail & Related papers (2022-04-28T14:56:43Z) - Self-Regulated Learning for Egocentric Video Activity Anticipation [147.9783215348252]
Self-Regulated Learning (SRL) aims to regulate the intermediate representation consecutively to produce representation that emphasizes the novel information in the frame of the current time-stamp.
SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets.
arXiv Detail & Related papers (2021-11-23T03:29:18Z) - CLTA: Contents and Length-based Temporal Attention for Few-shot Action
Recognition [2.0349696181833337]
We propose a Contents and Length-based Temporal Attention model, which learns customized temporal attention for the individual video.
We show that even a not fine-tuned backbone with an ordinary softmax classifier can still achieve similar or better results compared to the state-of-the-art few-shot action recognition.
arXiv Detail & Related papers (2021-03-18T23:40:28Z) - Intra- and Inter-Action Understanding via Temporal Action Parsing [118.32912239230272]
We construct a new dataset developed on sport videos with manual annotations of sub-actions, and conduct a study on temporal action parsing on top.
Our study shows that a sport activity usually consists of multiple sub-actions and that the awareness of such temporal structures is beneficial to action recognition.
We also investigate a number of temporal parsing methods, and thereon devise an improved method that is capable of mining sub-actions from training data without knowing the labels of them.
arXiv Detail & Related papers (2020-05-20T17:45:18Z) - Inferring Temporal Compositions of Actions Using Probabilistic Automata [61.09176771931052]
We propose to express temporal compositions of actions as semantic regular expressions and derive an inference framework using probabilistic automata.
Our approach is different from existing works that either predict long-range complex activities as unordered sets of atomic actions, or retrieve videos using natural language sentences.
arXiv Detail & Related papers (2020-04-28T00:15:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.