On the Benefits of 3D Pose and Tracking for Human Action Recognition
- URL: http://arxiv.org/abs/2304.01199v2
- Date: Mon, 7 Aug 2023 05:07:20 GMT
- Title: On the Benefits of 3D Pose and Tracking for Human Action Recognition
- Authors: Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Christoph
Feichtenhofer, Jitendra Malik
- Abstract summary: We show the benefits of using tracking and 3D poses for action recognition.
We propose a Lagrangian Action Recognition model by fusing 3D pose and contextualized appearance over tracklets.
Our method achieves state-of-the-art performance on the AVA v2.2 dataset.
- Score: 77.07134833715273
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work we study the benefits of using tracking and 3D poses for action
recognition. To achieve this, we take the Lagrangian view on analysing actions
over a trajectory of human motion rather than at a fixed point in space. Taking
this stand allows us to use the tracklets of people to predict their actions.
In this spirit, first we show the benefits of using 3D pose to infer actions,
and study person-person interactions. Subsequently, we propose a Lagrangian
Action Recognition model by fusing 3D pose and contextualized appearance over
tracklets. To this end, our method achieves state-of-the-art performance on the
AVA v2.2 dataset on both pose only settings and on standard benchmark settings.
When reasoning about the action using only pose cues, our pose model achieves
+10.0 mAP gain over the corresponding state-of-the-art while our fused model
has a gain of +2.8 mAP over the best state-of-the-art model. Code and results
are available at: https://brjathu.github.io/LART
Related papers
- HOIMotion: Forecasting Human Motion During Human-Object Interactions Using Egocentric 3D Object Bounding Boxes [10.237077867790612]
We present HOIMotion, a novel approach for human motion forecasting during human-object interactions.
Our method integrates information about past body poses and egocentric 3D object bounding boxes.
We show that HOIMotion consistently outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2024-07-02T19:58:35Z) - ROAM: Robust and Object-Aware Motion Generation Using Neural Pose
Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object.
We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object.
We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z) - 3D Pose Estimation and Future Motion Prediction from 2D Images [26.28886209268217]
This paper considers to jointly tackle the highly correlated tasks of estimating 3D human body poses and predicting future 3D motions from RGB image sequences.
Based on Lie algebra pose representation, a novel self-projection mechanism is proposed that naturally preserves human motion kinematics.
arXiv Detail & Related papers (2021-11-26T01:02:00Z) - Multi-level Motion Attention for Human Motion Prediction [132.29963836262394]
We study the use of different types of attention, computed at joint, body part, and full pose levels.
Our experiments on Human3.6M, AMASS and 3DPW validate the benefits of our approach for both periodical and non-periodical actions.
arXiv Detail & Related papers (2021-06-17T08:08:11Z) - H2O: Two Hands Manipulating Objects for First Person Interaction
Recognition [70.46638409156772]
We present a comprehensive framework for egocentric interaction recognition using markerless 3D annotations of two hands manipulating objects.
Our method produces annotations of the 3D pose of two hands and the 6D pose of the manipulated objects, along with their interaction labels for each frame.
Our dataset, called H2O (2 Hands and Objects), provides synchronized multi-view RGB-D images, interaction labels, object classes, ground-truth 3D poses for left & right hands, 6D object poses, ground-truth camera poses, object meshes and scene point clouds.
arXiv Detail & Related papers (2021-04-22T17:10:42Z) - Skeleton-DML: Deep Metric Learning for Skeleton-Based One-Shot Action
Recognition [0.5161531917413706]
One-shot action recognition allows the recognition of human-performed actions with only a single training example.
This can influence human-robot-interaction positively by enabling the robot to react to previously unseen behaviour.
We propose a novel image-based skeleton representation that performs well in a metric learning setting.
arXiv Detail & Related papers (2020-12-26T22:31:11Z) - History Repeats Itself: Human Motion Prediction via Motion Attention [81.94175022575966]
We introduce an attention-based feed-forward network that explicitly leverages the observation that human motion tends to repeat itself.
In particular, we propose to extract motion attention to capture the similarity between the current motion context and the historical motion sub-sequences.
Our experiments on Human3.6M, AMASS and 3DPW evidence the benefits of our approach for both periodical and non-periodical actions.
arXiv Detail & Related papers (2020-07-23T02:12:27Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.