Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction
- URL: http://arxiv.org/abs/2308.08942v2
- Date: Sat, 2 Sep 2023 13:41:06 GMT
- Title: Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction
- Authors: Chenxin Xu, Robby T. Tan, Yuhong Tan, Siheng Chen, Xinchao Wang,
Yanfeng Wang
- Abstract summary: This paper introduces a model learning framework with auxiliary tasks.
In our auxiliary tasks, partial body joints' coordinates are corrupted by either masking or adding noise.
We propose a novel auxiliary-adapted transformer, which can handle incomplete, corrupted motion data.
- Score: 106.06256351200068
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Exploring spatial-temporal dependencies from observed motions is one of the
core challenges of human motion prediction. Previous methods mainly focus on
dedicated network structures to model the spatial and temporal dependencies.
This paper considers a new direction by introducing a model learning framework
with auxiliary tasks. In our auxiliary tasks, partial body joints' coordinates
are corrupted by either masking or adding noise and the goal is to recover
corrupted coordinates depending on the rest coordinates. To work with auxiliary
tasks, we propose a novel auxiliary-adapted transformer, which can handle
incomplete, corrupted motion data and achieve coordinate recovery via capturing
spatial-temporal dependencies. Through auxiliary tasks, the auxiliary-adapted
transformer is promoted to capture more comprehensive spatial-temporal
dependencies among body joints' coordinates, leading to better feature
learning. Extensive experimental results have shown that our method outperforms
state-of-the-art methods by remarkable margins of 7.2%, 3.7%, and 9.4% in terms
of 3D mean per joint position error (MPJPE) on the Human3.6M, CMU Mocap, and
3DPW datasets, respectively. We also demonstrate that our method is more robust
under data missing cases and noisy data cases. Code is available at
https://github.com/MediaBrain-SJTU/AuxFormer.
Related papers
- Occlusion-Aware 3D Motion Interpretation for Abnormal Behavior Detection [10.782354892545651]
We present OAD2D, which discriminates against motion abnormalities based on reconstructing 3D coordinates of mesh vertices and human joints from monocular videos.
We reformulate the abnormal posture estimation by coupling it with Motion to Text (M2T) model in which, the VQVAE is employed to quantize motion features.
Our approach demonstrates the robustness of abnormal behavior detection against severe and self-occlusions, as it reconstructs human motion trajectories in global coordinates.
arXiv Detail & Related papers (2024-07-23T18:41:16Z) - Graph and Skipped Transformer: Exploiting Spatial and Temporal Modeling Capacities for Efficient 3D Human Pose Estimation [36.93661496405653]
We take a global approach to exploit Transformer-temporal information with a concise Graph and Skipped Transformer architecture.
Specifically, in 3D pose stage, coarse-grained body parts are deployed to construct a fully data-driven adaptive model.
Experiments are conducted on Human3.6M, MPI-INF-3DHP and Human-Eva benchmarks.
arXiv Detail & Related papers (2024-07-03T10:42:09Z) - SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal Transformers [57.46911575980854]
We introduce SkelFormer, a novel markerless motion capture pipeline for multi-view human pose and shape estimation.
Our method first uses off-the-shelf 2D keypoint estimators, pre-trained on large-scale in-the-wild data, to obtain 3D joint positions.
Next, we design a regression-based inverse-kinematic skeletal transformer that maps the joint positions to pose and shape representations from heavily noisy observations.
arXiv Detail & Related papers (2024-04-19T04:51:18Z) - Coordinate Transformer: Achieving Single-stage Multi-person Mesh
Recovery from Videos [91.44553585470688]
Multi-person 3D mesh recovery from videos is a critical first step towards automatic perception of group behavior in virtual reality, physical therapy and beyond.
We propose the Coordinate transFormer (CoordFormer) that directly models multi-person spatial-temporal relations and simultaneously performs multi-mesh recovery in an end-to-end manner.
Experiments on the 3DPW dataset demonstrate that CoordFormer significantly improves the state-of-the-art, outperforming the previously best results by 4.2%, 8.8% and 4.7% according to the MPJPE, PAMPJPE, and PVE metrics, respectively.
arXiv Detail & Related papers (2023-08-20T18:23:07Z) - (Fusionformer):Exploiting the Joint Motion Synergy with Fusion Network
Based On Transformer for 3D Human Pose Estimation [1.52292571922932]
Many previous methods lack the understanding of local joint information.cite8888987considers the temporal relationship of a single joint in this work.
Our proposed textbfFusionformer method introduces a global-temporal self-trajectory module and a cross-temporal self-trajectory module.
The results show an improvement of 2.4% MPJPE and 4.3% P-MPJPE on the Human3.6M dataset.
arXiv Detail & Related papers (2022-10-08T12:22:10Z) - Ret3D: Rethinking Object Relations for Efficient 3D Object Detection in
Driving Scenes [82.4186966781934]
We introduce a simple, efficient, and effective two-stage detector, termed as Ret3D.
At the core of Ret3D is the utilization of novel intra-frame and inter-frame relation modules.
With negligible extra overhead, Ret3D achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-08-18T03:48:58Z) - Improving Robustness and Accuracy via Relative Information Encoding in
3D Human Pose Estimation [59.94032196768748]
We propose a relative information encoding method that yields positional and temporal enhanced representations.
Our method outperforms state-of-the-art methods on two public datasets.
arXiv Detail & Related papers (2021-07-29T14:12:19Z) - MotioNet: 3D Human Motion Reconstruction from Monocular Video with
Skeleton Consistency [72.82534577726334]
We introduce MotioNet, a deep neural network that directly reconstructs the motion of a 3D human skeleton from monocular video.
Our method is the first data-driven approach that directly outputs a kinematic skeleton, which is a complete, commonly used, motion representation.
arXiv Detail & Related papers (2020-06-22T08:50:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.