Pose And Joint-Aware Action Recognition
- URL: http://arxiv.org/abs/2010.08164v2
- Date: Fri, 29 Oct 2021 21:12:40 GMT
- Title: Pose And Joint-Aware Action Recognition
- Authors: Anshul Shah, Shlok Mishra, Ankan Bansal, Jun-Cheng Chen, Rama
Chellappa, Abhinav Shrivastava
- Abstract summary: We present a new model for joint-based action recognition, which first extracts motion features from each joint separately through a shared motion encoder.
Our joint selector module re-weights the joint information to select the most discriminative joints for the task.
We show large improvements over the current state-of-the-art joint-based approaches on JHMDB, HMDB, Charades, AVA action recognition datasets.
- Score: 87.4780883700755
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent progress on action recognition has mainly focused on RGB and optical
flow features. In this paper, we approach the problem of joint-based action
recognition. Unlike other modalities, constellation of joints and their motion
generate models with succinct human motion information for activity
recognition. We present a new model for joint-based action recognition, which
first extracts motion features from each joint separately through a shared
motion encoder before performing collective reasoning. Our joint selector
module re-weights the joint information to select the most discriminative
joints for the task. We also propose a novel joint-contrastive loss that pulls
together groups of joint features which convey the same action. We strengthen
the joint-based representations by using a geometry-aware data augmentation
technique which jitters pose heatmaps while retaining the dynamics of the
action. We show large improvements over the current state-of-the-art
joint-based approaches on JHMDB, HMDB, Charades, AVA action recognition
datasets. A late fusion with RGB and Flow-based approaches yields additional
improvements. Our model also outperforms the existing baseline on Mimetics, a
dataset with out-of-context actions.
Related papers
- Joint Temporal Pooling for Improving Skeleton-based Action Recognition [4.891381363264954]
In skeleton-based human action recognition, temporal pooling is a critical step for capturing relationship of joint dynamics.
This paper presents a novel Adaptive Joint Motion Temporal Pooling (MAP) method for improving skeleton-based action recognition.
The efficacy of JMAP has been validated through experiments on the popular NTU RGBD+ 120 and PKU-MMD datasets.
arXiv Detail & Related papers (2024-08-18T04:40:16Z) - Joint-Motion Mutual Learning for Pose Estimation in Videos [21.77871402339573]
Human pose estimation in videos has long been a compelling yet challenging task within the realm of computer vision.
Recent methods strive to integrate multi-frame visual features generated by a backbone network for pose estimation.
We propose a novel joint-motion mutual learning framework for pose estimation.
arXiv Detail & Related papers (2024-08-05T07:37:55Z) - Learning Mutual Excitation for Hand-to-Hand and Human-to-Human
Interaction Recognition [22.538114033191313]
We propose a mutual excitation graph convolutional network (me-GCN) by stacking mutual excitation graph convolution layers.
Me-GC learns mutual information in each layer and each stage of graph convolution operations.
Our proposed me-GC outperforms state-of-the-art GCN-based and Transformer-based methods.
arXiv Detail & Related papers (2024-02-04T10:00:00Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Joint-bone Fusion Graph Convolutional Network for Semi-supervised
Skeleton Action Recognition [65.78703941973183]
We propose a novel correlation-driven joint-bone fusion graph convolutional network (CD-JBF-GCN) as an encoder and use a pose prediction head as a decoder.
Specifically, the CD-JBF-GC can explore the motion transmission between the joint stream and the bone stream.
The pose prediction based auto-encoder in the self-supervised training stage allows the network to learn motion representation from unlabeled data.
arXiv Detail & Related papers (2022-02-08T16:03:15Z) - Skeleton-Based Mutually Assisted Interacted Object Localization and
Human Action Recognition [111.87412719773889]
We propose a joint learning framework for "interacted object localization" and "human action recognition" based on skeleton data.
Our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition.
arXiv Detail & Related papers (2021-10-28T10:09:34Z) - Modeling long-term interactions to enhance action recognition [81.09859029964323]
We propose a new approach to under-stand actions in egocentric videos that exploits the semantics of object interactions at both frame and temporal levels.
We use a region-based approach that takes as input a primary region roughly corresponding to the user hands and a set of secondary regions potentially corresponding to the interacting objects.
The proposed approach outperforms the state-of-the-art in terms of action recognition on standard benchmarks.
arXiv Detail & Related papers (2021-04-23T10:08:15Z) - Motion Guided Attention Fusion to Recognize Interactions from Videos [40.1565059238891]
We present a dual-pathway approach for recognizing fine-grained interactions from videos.
We fuse the bottom-up features in the motion pathway with features captured from object detections to learn the temporal aspects of an action.
We show that our approach can generalize across appearance effectively and recognize actions where an actor interacts with previously unseen objects.
arXiv Detail & Related papers (2021-04-01T17:44:34Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.