Win-Fail Action Recognition
- URL: http://arxiv.org/abs/2102.07355v1
- Date: Mon, 15 Feb 2021 06:03:10 GMT
- Title: Win-Fail Action Recognition
- Authors: Paritosh Parmar, Brendan Morris
- Abstract summary: We introduce the task of win-fail action recognition differentiating -- between successful and failed attempts at various activities.
Unlike existing action recognition datasets, intra-class variation is high making the task challenging, yet feasible.
We systematically analyze the characteristics of the win-fail task/dataset with prototypical action recognition networks and a novel video retrieval task.
- Score: 4.56877715768796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current video/action understanding systems have demonstrated impressive
performance on large recognition tasks. However, they might be limiting
themselves to learning to recognize spatiotemporal patterns, rather than
attempting to thoroughly understand the actions. To spur progress in the
direction of a truer, deeper understanding of videos, we introduce the task of
win-fail action recognition -- differentiating between successful and failed
attempts at various activities. We introduce a first of its kind paired
win-fail action understanding dataset with samples from the following domains:
"General Stunts," "Internet Wins-Fails," "Trick Shots," and "Party Games."
Unlike existing action recognition datasets, intra-class variation is high
making the task challenging, yet feasible. We systematically analyze the
characteristics of the win-fail task/dataset with prototypical action
recognition networks and a novel video retrieval task. While current action
recognition methods work well on our task/dataset, they still leave a large gap
to achieve high performance. We hope to motivate more work towards the true
understanding of actions/videos. Dataset will be available from
https://github.com/ParitoshParmar/Win-Fail-Action-Recognition.
Related papers
- The impact of Compositionality in Zero-shot Multi-label action recognition for Object-based tasks [4.971065912401385]
We propose Dual-VCLIP, a unified approach for zero-shot multi-label action recognition.
Dual-VCLIP enhances VCLIP, a zero-shot action recognition method, with the DualCoOp method for multi-label image classification.
We validate our method on the Charades dataset that includes a majority of object-based actions.
arXiv Detail & Related papers (2024-05-14T15:28:48Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - Learning and Verification of Task Structure in Instructional Videos [85.511888642497]
We introduce a new pre-trained video model, VideoTaskformer, focused on representing the semantics and structure of instructional videos.
Compared to prior work which learns step representations locally, our approach involves learning them globally.
We introduce two new benchmarks for detecting mistakes in instructional videos, to verify if there is an anomalous step and if steps are executed in the right order.
arXiv Detail & Related papers (2023-03-23T17:59:54Z) - Multi-dataset Training of Transformers for Robust Action Recognition [75.5695991766902]
We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition.
Here, we propose a novel multi-dataset training paradigm, MultiTrain, with the design of two new loss terms, namely informative loss and projection loss.
We verify the effectiveness of our method on five challenging datasets, Kinetics-400, Kinetics-700, Moments-in-Time, Activitynet and Something-something-v2.
arXiv Detail & Related papers (2022-09-26T01:30:43Z) - ActAR: Actor-Driven Pose Embeddings for Video Action Recognition [12.043574473965318]
Human action recognition (HAR) in videos is one of the core tasks of video understanding.
We propose a new method that simultaneously learns to recognize efficiently human actions in the infrared spectrum.
arXiv Detail & Related papers (2022-04-19T05:12:24Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z) - Intra- and Inter-Action Understanding via Temporal Action Parsing [118.32912239230272]
We construct a new dataset developed on sport videos with manual annotations of sub-actions, and conduct a study on temporal action parsing on top.
Our study shows that a sport activity usually consists of multiple sub-actions and that the awareness of such temporal structures is beneficial to action recognition.
We also investigate a number of temporal parsing methods, and thereon devise an improved method that is capable of mining sub-actions from training data without knowing the labels of them.
arXiv Detail & Related papers (2020-05-20T17:45:18Z) - FineGym: A Hierarchical Video Dataset for Fine-grained Action
Understanding [118.32912239230272]
FineGym is a new action recognition dataset built on top of gymnastic videos.
It provides temporal annotations at both action and sub-action levels with a three-level semantic hierarchy.
This new level of granularity presents significant challenges for action recognition.
arXiv Detail & Related papers (2020-04-14T17:55:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.