Quo Vadis, Skeleton Action Recognition ?
- URL: http://arxiv.org/abs/2007.02072v2
- Date: Wed, 7 Apr 2021 16:30:54 GMT
- Title: Quo Vadis, Skeleton Action Recognition ?
- Authors: Pranay Gupta, Anirudh Thatipelli, Aditya Aggarwal, Shubh Maheshwari,
Neel Trivedi, Sourav Das, Ravi Kiran Sarvadevabhatla
- Abstract summary: We study current and upcoming frontiers across the landscape of skeleton-based human action recognition.
To study skeleton-action recognition in the wild, we introduce Skeletics-152, a curated subset of RGB videos sourced from Kinetics-700.
We extend our study to include out-of-context actions by introducing Skeleton-Mimetics and Metaphorics datasets.
- Score: 11.389618872289647
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study current and upcoming frontiers across the landscape
of skeleton-based human action recognition. To study skeleton-action
recognition in the wild, we introduce Skeletics-152, a curated and 3-D
pose-annotated subset of RGB videos sourced from Kinetics-700, a large-scale
action dataset. We extend our study to include out-of-context actions by
introducing Skeleton-Mimetics, a dataset derived from the recently introduced
Mimetics dataset. We also introduce Metaphorics, a dataset with caption-style
annotated YouTube videos of the popular social game Dumb Charades and
interpretative dance performances. We benchmark state-of-the-art models on the
NTU-120 dataset and provide multi-layered assessment of the results. The
results from benchmarking the top performers of NTU-120 on the newly introduced
datasets reveal the challenges and domain gap induced by actions in the wild.
Overall, our work characterizes the strengths and limitations of existing
approaches and datasets. Via the introduced datasets, our work enables new
frontiers for human action recognition.
Related papers
- Scaling Up Dynamic Human-Scene Interaction Modeling [58.032368564071895]
TRUMANS is the most comprehensive motion-captured HSI dataset currently available.
It intricately captures whole-body human motions and part-level object dynamics.
We devise a diffusion-based autoregressive model that efficiently generates HSI sequences of any length.
arXiv Detail & Related papers (2024-03-13T15:45:04Z) - Fine-grained Action Analysis: A Multi-modality and Multi-task Dataset of Figure Skating [10.391609684374268]
We propose a Multi-modality and Multi-task dataset of Figure Skating (MMFS) which was collected from the World Figure Skating Championships.
MMFS, which possesses action recognition and action quality assessment, captures RGB, skeleton, and is collected the score of actions from 11671 clips with 256 categories including spatial and temporal labels.
arXiv Detail & Related papers (2023-07-06T02:30:56Z) - Learning Action-Effect Dynamics from Pairs of Scene-graphs [50.72283841720014]
We propose a novel method that leverages scene-graph representation of images to reason about the effects of actions described in natural language.
Our proposed approach is effective in terms of performance, data efficiency, and generalization capability compared to existing models.
arXiv Detail & Related papers (2022-12-07T03:36:37Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - ANUBIS: Review and Benchmark Skeleton-Based Action Recognition Methods
with a New Dataset [26.581495230711198]
We present a review in the form of a taxonomy on existing works of skeleton-based action recognition.
To promote more fair and comprehensive evaluation, we collect ANUBIS, a large-scale human skeleton dataset.
arXiv Detail & Related papers (2022-05-04T14:03:43Z) - MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains.
We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images.
A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z) - Skeleton-Based Mutually Assisted Interacted Object Localization and
Human Action Recognition [111.87412719773889]
We propose a joint learning framework for "interacted object localization" and "human action recognition" based on skeleton data.
Our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition.
arXiv Detail & Related papers (2021-10-28T10:09:34Z) - UNIK: A Unified Framework for Real-world Skeleton-based Action
Recognition [11.81043814295441]
We introduce UNIK, a novel skeleton-based action recognition method that is able to generalize across datasets.
To study the cross-domain generalizability of action recognition in real-world videos, we re-evaluate state-of-the-art approaches as well as the proposed UNIK.
Results show that the proposed UNIK, with pre-training on Posetics, generalizes well and outperforms state-of-the-art when transferred onto four target action classification datasets.
arXiv Detail & Related papers (2021-07-19T02:00:28Z) - CDEvalSumm: An Empirical Study of Cross-Dataset Evaluation for Neural
Summarization Systems [121.78477833009671]
We investigate the performance of different summarization models under a cross-dataset setting.
A comprehensive study of 11 representative summarization systems on 5 datasets from different domains reveals the effect of model architectures and generation ways.
arXiv Detail & Related papers (2020-10-11T02:19:15Z) - A Survey on 3D Skeleton-Based Action Recognition Using Learning Method [20.865811389226234]
3D skeleton-based action recognition, owing to the latent advantages of skeleton, has been an active topic in computer vision.
This survey firstly highlight the necessity of action recognition and the significance of 3D-skeleton data.
Then a comprehensive introduction about Recurrent Neural Network(RNN)-based, Convolutional Neural Network(CNN)-based and Graph Convolutional Network(GCN)-based main stream action recognition techniques are illustrated.
arXiv Detail & Related papers (2020-02-14T08:12:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.